Commit Graph

723 Commits

Author SHA1 Message Date
Ronald S. Bultje
b72ecbb1b9 Add best_rd breakout in intra4x4 RD loop.
Encoding time of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min5.4 to 1min4.0, i.e. 2.2% faster overall.

Change-Id: I8c32f2aff9a649ce7dd49d910dc5ba16b99c3bc6
2013-07-24 09:02:05 -07:00
Ronald S. Bultje
47336afd8d Merge "More optimizations for cost_coeffs()." 2013-07-23 21:36:12 -07:00
Dmitry Kovalev
db7f5d28b9 Removing vp9_is_interpolating_filter array.
All filters are interpolating now, so we don't need this array, all
values from this array are evaluated to true.

Change-Id: I9af6d8219ae0eb984063cd15e4e2296374ae4961
2013-07-23 14:24:39 -07:00
Dmitry Kovalev
2855d8aea1 Merge "Adding update_tx_counts function." 2013-07-23 13:57:59 -07:00
James Zern
8dede954c7 Merge "vp9: make some static tables const" 2013-07-23 11:37:01 -07:00
Jim Bankoski
86a9dec73c clean up bw, bh
many structures use bw and bh and they have different meanings.   This cl attempts
to start this clean up and remove unneccessary 2 step look up log and then
shift operations...

also removed partition type multiple operation code in bitstream.c.

Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
2013-07-23 06:51:44 -07:00
Paul Wilkins
7c134bc0cd Merge "Reworked the auto_mv_step_size speed feature" 2013-07-23 04:49:55 -07:00
James Zern
3c8cce353f vp9: make some static tables const
Change-Id: I8bcae51271673da8755c66a51aea005dfe6a3739
2013-07-22 19:19:13 -07:00
Ronald S. Bultje
e20fcd9585 More optimizations for cost_coeffs().
4x4:    163 ->  123 cycles (33% faster)
8x8:    491 ->  399 cycles (23% faster)
16x16: 1889 -> 1763 cycles (7% faster)
32x32: 8311 -> 8180 cycles (1.6% faster)

Overall encoding time of first 50 frames of bus (speed 0) @ 1500kbps
goes from 1min4.33 to 1min3.00, i.e. 2.11% faster.

Change-Id: Ib52d1dbb5649b14de769d3e7a74af67440b5284f
2013-07-22 16:09:09 -07:00
Dmitry Kovalev
b2fc6fa969 Adding update_tx_counts function.
Moving common encoder/decoder code to update_tx_counts. Also renaming
vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
(twice before).

Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d
2013-07-22 14:57:43 -07:00
Yaowu Xu
fc186dcad6 fix a build error
Change-Id: I3b05687f439ff6a7c426d2c97a6c58c831fa51ac
2013-07-22 12:37:30 -07:00
Jingning Han
416f315e82 Merge "Skip buffer update in sub8x8 rd loop" 2013-07-22 12:08:22 -07:00
Jingning Han
a5a9f5f7f3 Merge "Optimize operation flow in sub8x8 rd loop" 2013-07-22 12:08:15 -07:00
Jingning Han
409e77f2d4 Optimize operation flow in sub8x8 rd loop
Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
the encoder to skip the forward transform, quantization, and coeff cost
estimation, in the sub8x8 rd optimization search, if the motion
vector(s) are of integer pixel value, and have been tested in the
previous prediction filter type rd loops of the same block.

This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
Its efficacy depends how frequently the motion search will select an
integer motion vector.

Change-Id: Iee15d4283ad4adea05522c1d40b198b127e6dd97
2013-07-22 10:40:33 -07:00
Paul Wilkins
1d189d6464 Re-order mode search in rd.
Mode search order in rd loop changed to better reflect
observed hit counts.

Also some adjustment of the baseline mode rd thresholds
to reflect the order change and observed frequencies.

Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c
2013-07-22 17:21:12 +01:00
Dmitry Kovalev
39342db138 Merge "Consistent names for inter mode probabilities and encodings." 2013-07-20 22:40:51 -07:00
Dmitry Kovalev
f66821afbb Merge "Removing frame_type field from MACROBLOCKD struct." 2013-07-20 22:40:06 -07:00
Jingning Han
c725502bf3 Skip buffer update in sub8x8 rd loop
This commit allows the encoder to skip a few buffer update steps in
rd_pick_best_mbsegmentation, when early breakout has been triggered
in the rd_check_segment_txsize. It provides about 1% speed-up for
bus_cif at 2000 kbps, in the settings of speed 0.

Change-Id: Ica034f10a24dec572b397d8389a2b81020ebc0b9
2013-07-20 21:38:12 -07:00
Deb Mukherjee
302698fb12 Reworked the auto_mv_step_size speed feature
This patch modifies the auto_mv_step_size speed feature to
use a combination of the maximum magnitude mv from the last
inter frame, and the maximum magnitude mv for the two reference
mvs with the same reference. For arf frames, the max mav step
for the resolution is used.
The bounds therefore are slightly tighter. The feature is made
a speed 1 feature.

Rebased.

Results (when this feature is turned on over speed 0):
derfraw300: -0.046% psnr, about 5+% speedup
(tested on football: goes from 4m30.760s to 4m17.410s).

Change-Id: If492797a61b0b4b3e58c0b8f86afb880165fc9f6
2013-07-19 15:12:56 -07:00
Dmitry Kovalev
e71a4a77bb Merge "Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE)." 2013-07-19 12:14:32 -07:00
Dmitry Kovalev
97e96bc4e9 Removing frame_type field from MACROBLOCKD struct.
Change-Id: Ia4e83913251c1cdc7aa2abd64bf01ecb1a962119
2013-07-19 11:55:36 -07:00
Dmitry Kovalev
c0eb57406c Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE).
Moving TX_MODE enum to vp9_enums.h. Renaming txfm_mode variables to
tx_mode.

Change-Id: I459d1af6dd928ce7fccdf8ce30b6f1ca057bef92
2013-07-19 11:37:13 -07:00
Dmitry Kovalev
afe43d4089 Removing redundant VP9_COMMON* from function signatures.
Functions: vp9_get_pred_context_switchable_interp,
           vp9_get_pred_context_intra_inter,
           vp9_get_pred_context_single_ref_p1,
           vp9_get_pred_context_single_ref_p2.

Change-Id: I3d6fb8aee23c9062270768e1e6da416dd9bb8f96
2013-07-19 11:20:49 -07:00
Dmitry Kovalev
bc7acb134b Consistent names for inter mode probabilities and encodings.
Renaming vp9_sb_mv_ref_tree to vp9_inter_mode_tree, and
vp9_sb_mv_ref_encoding_array to vp9_inter_mode_encodings.

Change-Id: I0e91fbf81350d3ec5a2599064c74089b5d06133a
2013-07-19 10:40:04 -07:00
Yaowu Xu
37d901a47a Merge "Add best_rd breakout to keyframe partition selection also." 2013-07-18 17:50:39 -07:00
Yaowu Xu
67fb0679ee Merge "Merge scale_factors and scale_factors_uv." 2013-07-18 17:50:34 -07:00
Yaowu Xu
55b52e32da Merge "Do in-place UV intra mode selection." 2013-07-18 17:50:07 -07:00
Yaowu Xu
51972d1279 Merge "Change break statement in a 2d loop to a return statement." 2013-07-18 17:49:58 -07:00
Dmitry Kovalev
92f4198d52 Merge "Using VP9_REF_NO_SCALE instead of (1 << VP9_REF_SCALE_SHIFT)." 2013-07-18 17:29:05 -07:00
Dmitry Kovalev
0b562b2d3d Using VP9_REF_NO_SCALE instead of (1 << VP9_REF_SCALE_SHIFT).
Change-Id: Ide58a74d31ff948319445a6337d2c05e98720e34
2013-07-18 15:12:46 -07:00
Ronald S. Bultje
96e4db2660 Add best_rd breakout to keyframe partition selection also.
Change-Id: I96b8058f6dfecf8aa3e152cdcbfd7e10071fbbc9
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
5ebe503f04 Merge scale_factors and scale_factors_uv.
This prevents a duplicate memcpy of a 128-byte struct every time
set_scale_factors() is called (which is a lot), thus leading to a
decrease from 3.7 MB to 1.85 MB of struct copying per 64x64 block
RD/partition loop.

Overall, this decreases encoding time of the first 50 frames of bus
@ 1500kbps (speed 0) from 1min5.9 to 1min4.9, i.e. about a 1.5%
overall speedup. We can likely get more gains by removing the copy
of the other struct (and replacing it with an indexing) as well.

Change-Id: I3dceb7e79f71e6fe911b11cc994cf89a869dde7a
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
df4b4fab26 Do in-place UV intra mode selection.
This means we only do UV intra mode selection if we find any intra
mode to actually be useful at all; in addition, we only do UV intra
mode selection for the transform sizes that were selected, rather
than all sizes available in this partition.

First 50 frames of bus @ 1500kbps (speed 0) gains about 5% with this
change.

Change-Id: I7b461eb8b803247f57896c5a9505f745b55502b3
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
e54a5782b9 Change break statement in a 2d loop to a return statement.
The break statement only breaks out of the nested loop, not the
top-level loop, so it doesn't always work as intended. Changing it
to a return statement does what's intended.

Change-Id: I585419823b39a04ec8826b1c8a216099b1728ba7
2013-07-18 14:10:56 -07:00
Ronald S. Bultje
2d4929e340 Remove motion vectors from PARTITION_INFO.
The same information already exists in union b_mode_info.

Change-Id: Iac5086b99a3c3cc270380138062bb693e58f9e6d
2013-07-18 14:10:52 -07:00
Ronald S. Bultje
9da67da04a Merge "Fix bug where we don't choose any mode in RD selection." 2013-07-18 12:47:50 -07:00
Ronald S. Bultje
247197d57b Fix bug where we don't choose any mode in RD selection.
This could happen during golden overlay frame coding from a previous
alt-ref frame if the special overlay code was triggered.

Change-Id: I3056d0c547cd26903b260ef93c94026e96bd9868
2013-07-18 12:13:15 -07:00
Ronald S. Bultje
4f5815290c Merge "Fix bug which skips zeromv even if near/nearest is not 0,0." 2013-07-18 10:06:51 -07:00
Ronald S. Bultje
deb7456058 Fix bug which skips zeromv even if near/nearest is not 0,0.
Change-Id: Id4f454831f3f11099f39c30246adeaa52857d08d
2013-07-18 09:35:19 -07:00
Jingning Han
ced3c20165 Use mv_check_bounds in sub8x8 rd loop
Make the use of mv_check_bounds consistent for mvs of both ref_frame[0]
and ref_frame[1].

Change-Id: I1ca24865cc7232ca9cbe5db566c53abad1592211
2013-07-17 17:13:51 -07:00
Ronald S. Bultje
facecd80da Merge "Add a best_yrd shortcut in splitmv mode search." 2013-07-17 16:11:13 -07:00
Ronald S. Bultje
056111c822 Merge "Skip redundant nearest/near/zero encodes in splitmv." 2013-07-17 16:10:51 -07:00
Ronald S. Bultje
0b1eba25b2 Merge "Skip nearest/near/zero redundant encodes." 2013-07-17 16:10:41 -07:00
Ronald S. Bultje
607424449c Merge "Best_rd breakout in rd partition search." 2013-07-17 16:10:22 -07:00
Yaowu Xu
6ac5b7db2c Merge "changed mode checking order" 2013-07-17 14:44:40 -07:00
Dmitry Kovalev
a7a1e96136 Merge changes Ieffea49e,Idf610746
* changes:
  Removing two unused arguments from vp9_inc_mv signature.
  Changing signature of vp9_get_pred_probs_tx_size.
2013-07-17 14:44:20 -07:00
Ronald S. Bultje
c6917528a5 Add a best_yrd shortcut in splitmv mode search.
Encoding of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min6.2 to 1min5.9, i.e. 0.5% faster overall.

Change-Id: I59d8a3b2f0a75010fa041d5e2646c8caac5bd683
2013-07-17 14:21:57 -07:00
Ronald S. Bultje
161c995658 Skip redundant nearest/near/zero encodes in splitmv.
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from
1min7.3 to 1min6.2, i.e. 1.7% faster overall.

Change-Id: I19d2deacfbffadd61d32551cee9586757ab4a987
2013-07-17 13:53:48 -07:00
Yaowu Xu
42facc292d changed mode checking order
Change-Id: Ic4c4b363ed840935e42f495f13ea5e601a56f1b2
2013-07-17 13:43:50 -07:00
Ronald S. Bultje
8fea880b6f Skip nearest/near/zero redundant encodes.
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min12.8
to 1min7.3, i.e. 8% faster.

Change-Id: Ia22d1c7b687316c553cc60eacae988b24e175b62
2013-07-17 11:33:15 -07:00
Ronald S. Bultje
9f427bfe98 Best_rd breakout in rd partition search.
About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
goes from 1min36 to 1min24. Results become slightly better (+0.2% on
derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
super_block_yrd(). Overall speed change (on derfraw300) is roughly
-13%. This can probably be improved further by caching best_yrd
between partition searches. Also, we might be able to get more
speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
just at the sb8x8 level.

Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
2013-07-17 09:56:46 -07:00
Ronald S. Bultje
83c7e13a6b Do a skip-block check for sub8x8 partitions also.
+0.2% SSIM and glbPSNR on derfraw300.

Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
2013-07-17 09:46:47 -07:00
Yunqing Wang
df90d58f4f Speed up motion estimation using small partitions' result(experiment)
Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.

Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.

Further improvement will be done later.

Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
2013-07-17 09:11:47 -07:00
Paul Wilkins
d66eab15dd Merge "Move uv intra mode selection in rd loop." 2013-07-17 05:19:26 -07:00
Paul Wilkins
154c34a3ee Merge "Limit transform sizes searched for uv intra." 2013-07-17 03:40:11 -07:00
Paul Wilkins
2ee338ce3b Move uv intra mode selection in rd loop.
Use an estimate based on DC_PRED for intra uv cost
within the rd loop then only do a full uv mode analysis
if an intra mode is chosen.

Significant speed gains in some cases. Currently only
enabled for speed 2 pending speed/quality tests.

Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
2013-07-17 11:11:21 +01:00
Paul Wilkins
6c667f0ffe Limit transform sizes searched for uv intra.
Apply limit if search_method == USE_LARGESTALL
to the range of UV tx sizes searched.

Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c
2013-07-17 11:08:55 +01:00
Paul Wilkins
5f4722c75f Merge "Minor cleanup in code to fine uv tx_size." 2013-07-17 02:50:09 -07:00
Jingning Han
a142d6fc93 Skip redundant motion search in 4x4 level rd loop
This commit makes the encoder to perform motion search only once
per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
at 2000 kbps, the runtime goes from 253812ms -> 217817ms
(14% speed-up) for speed 0.

Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92
2013-07-16 17:21:11 -07:00
Dmitry Kovalev
5b65a71cdc Changing signature of vp9_get_pred_probs_tx_size.
Removing VP9_COMMON* argument and adding struct tx_probs* instead of
MACROBLOCKD*.

Change-Id: Idf61074631a90ec51eac22c8dcd977f44ac0757c
2013-07-16 16:34:54 -07:00
Paul Wilkins
30d2ea45ce Minor cleanup in code to fine uv tx_size.
Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e
2013-07-16 18:27:33 +01:00
Dmitry Kovalev
ca75f1255f Removing and moving around constant definitions.
Removing unused and duplicated constants, moving them from *.h to *.c
if possible.

Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f
2013-07-15 19:26:30 -07:00
Jingning Han
faff6ed0fb Skip duplicate block encoding in the rd loop
This speed feature allows the encoder to largely remove the spatial
dependency between blocks inside a 64x64 superblock, thereby removing
the need to repeatedly encode superblocks per partition type in the
rate-distortion optimization loop.

A major challenge lies in the intra modes tested in the rate-distortion
optimization loop. The subsequent blocks do not have access to the
reconstructed boundary pixels without the intermediate coding steps.
This was resolved by using the original pixels for intra prediction
in the rd loop, followed by an appropriately designed distortion
modeling on the quantization parameters. Experiments also suggested
that the performance impact is more discernible at lower bit-rate/psnr
settings. Hence a quantizer dependent threshold is applied to deactivate
skip of block coding.

For bus_cif at 2000 kbps,
speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
         performance loss.

speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
         performance loss.

This operation is currently turned on in settings of speed 1.

Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
2013-07-15 11:08:58 -07:00
Yaowu Xu
fb754b182f Fix a build issue
Change-Id: I23a75c495ed7ea917d7f312bef0990e20a6b53d9
2013-07-12 11:38:44 -07:00
Deb Mukherjee
94c481f9f1 Some minor cleanups for efficiency
Implements some of the helper functions more efficiently with
lookups rathers than branches. Modeling function is consolidated
to reduce some computations.

Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
one because there is no need to keep them separate (even though
the semantics are a little different).

No bitstream or output change.

About 0.5% speedup

Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
2013-07-12 10:22:56 -07:00
Ronald S. Bultje
ee09dd9949 Remove unused function block_error().
Change-Id: I78a79fc51c2d7cc3c261f35b569155397f3dc0c4
2013-07-11 17:14:03 -07:00
Dmitry Kovalev
8c05e59065 Calling is_inter_mode() instead of custom code.
Change-Id: Iccd4ab95ea51a6d57ed43947f2fd7ad92e8979cf
2013-07-11 14:14:47 -07:00
Dmitry Kovalev
c4ad3273c7 Moving segmentation related vars into separate struct.
Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.

Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
2013-07-11 11:57:57 -07:00
Jingning Han
18803f9cc4 Fix tx_type bug in intra4x4 rd loop
This commit fixed the mis-use of the tx_type for inverse transform
in intra4x4 rate-distortion optimization loop. It improves the
overall coding performance.

Change-Id: I7fe9953175b74890357dbcee33c138573766e980
2013-07-10 15:49:49 -07:00
Deb Mukherjee
7494bba66b Merge "Prunes out full-rd computation based on modeled rd" 2013-07-10 15:37:11 -07:00
Jim Bankoski
865ca76604 Merge "remove warnings when NDEBUG is set" 2013-07-10 14:39:39 -07:00
Jim Bankoski
6591cf2f7e remove warnings when NDEBUG is set
Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
2013-07-10 14:27:20 -07:00
Deb Mukherjee
53ff43adc3 Prunes out full-rd computation based on modeled rd
Adds a speed feature to eliminate full-rd computation if the modeled
rd or rd based on a different parameter in the same mode is already
a lot larger than the best rd yet.

Specifically, only search the sharp and smooth filters if the modeled
rd cost based on the  regular filter is within a certain factor of the
best rd cost so far. Also, skip full-rd computation of non splitmv
inter modes if the modeled rd cost based on pred error is within the
same factor of the best rd cost so far.

Also adds some enhancements in the rd search for splitmv mode to
speed things up by early breakouts. Negligible impact on performance.

Resuts on derfraw300:
psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
         breakout feature on.
speedup: 6% with splitmv enhancements, 20% with also residual breakout
         (tested on football sequence at 600 Kbps)

Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
2013-07-10 13:49:49 -07:00
Yaowu Xu
e52eec490c Merge "Add a feature to reduce chrome intra mode search" 2013-07-10 11:35:47 -07:00
Ronald S. Bultje
b1df674a99 Remove memcpy() in handle_inter_mode() filter selection.
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83
2013-07-10 09:27:56 -07:00
Yaowu Xu
bed27a960a Add a feature to reduce chrome intra mode search
Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8
2013-07-10 08:59:18 -07:00
Jim Bankoski
fb027a7658 removing case statements around prediction entropy coding
Removes SEG_ID
Removes MBSKIP
Removes SWITCHABLE_INTERP
Removes INTRA_INTER
Removes COMP_INTER_INTER
Removes COMP_REF_P
Removes SINGLE_REF_P1
Removes SINGLE_REF_P2
Removes TX_SIZE

Change-Id: Ie4520ae1f65c8cac312432c0616cc80dea5bf34b
2013-07-09 20:10:16 -07:00
Yaowu Xu
205efbc153 Revert "Remove memcpy() in handle_inter_mode() filter selection."
This reverts commit fcf7998a47.

Change-Id: Ic6532223faec9f1483b78adb2e37b79c7b1a0efb
2013-07-09 17:42:10 -07:00
Ronald S. Bultje
204d1b7058 Merge "Unbreak lossless." 2013-07-09 09:54:48 -07:00
Ronald S. Bultje
059c0ba5d4 Unbreak lossless.
Change-Id: I8130ec9b5371c65e885f245a5ac73840c23cb4a1
2013-07-09 09:46:37 -07:00
Dmitry Kovalev
1c65c580d6 Merge "Refactoring setup_pre_planes function." 2013-07-08 20:08:05 -07:00
Ronald S. Bultje
8fde07a3ae Don't recalculate mv_ref costs for each block/partition.
Changes cost_mv_ref() into doing a LUT into pre-calculated cost
arrays instead. Encode time of first 50 frames of bus (speed 0)
@ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.

Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
fcf7998a47 Remove memcpy() in handle_inter_mode() filter selection.
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
ed995afba1 Make frame-wide filter-type decision fully RD-based.
Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.

Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
2013-07-08 16:22:37 -07:00
Deb Mukherjee
d9b62160a0 Implements several heuristics to prune mode search
Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.

Specifically the flags and their impact are:

1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup

2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup

3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup

4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup

Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
2013-07-08 12:17:12 -07:00
Paul Wilkins
ef0ca2deaa Merge "Fix to comp_inter_joint_search_thresh feature." 2013-07-04 03:27:00 -07:00
Dmitry Kovalev
f72e072555 Refactoring setup_pre_planes function.
Removing set_refs, adding set_ref function.

Change-Id: I5635c478b106ae4e57d317f1c83d929644307e63
2013-07-03 17:42:01 -07:00
Jingning Han
68172dbede Merge "Enable early termination in rd search" 2013-07-03 14:20:41 -07:00
Jingning Han
2bd6fe08f8 Enable early termination in rd search
This commit allows encoder to detect the cumulative rate-distortion
cost per transformed block inside a partition. If the cumulative
rd cost is already above the best rd value, it terminates the rest
operations and continue to next prediction mode test.

It reduces the runtime of bus at target bit-rate 2000 from 308 second
to 266 second, i.e., about 13% speed-up at no performance penalty.

Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a
2013-07-03 12:54:18 -07:00
Paul Wilkins
f58b44ad62 Fix to comp_inter_joint_search_thresh feature.
When this is 0 (BLOCK_SIZE_AB4X4) we want to do
the inter joint search for all sizes.

Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88
2013-07-03 16:58:34 +01:00
Paul Wilkins
72c5778ec5 Added two new skip experiments.
sf->unused_mode_skip_lvl. Tests modes as normal for all
sizes at or below the given level. At larger sizes it skips
all modes that were not chosen at any smaller size.
Hence setting BLOCK_SIZE_SB64X64 is in effect off.
Setting BLOCK_SIZE_AB4X4 will only consider modes that
were chosen for one or more 4x4 blocks at larger sizes.

sf->reference_masking.
Do a test encode of the NONE partition at one size and create
a reference frame mask based on the best rd choice. In the
full search only allow this reference frame.
Currently it is testing 64x64 and repeats this in the full search.
This does not work well with Jim's Partition code just now and
is disabled by default.

Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd
2013-07-03 16:56:06 +01:00
Dmitry Kovalev
be77f6bbbf Removing redundant struct from union b_mode_info.
Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
2013-07-02 16:51:57 -07:00
Deb Mukherjee
37501d687c Speed feature to binary search dir intramodes
This speed feature will skip searching the directional intra prediction
modes D63, D117, D27, D153 if the best intra mode so far is not one of
the diagonal, horizontal or vertical directions closest to the respective
directions being tested. In other words, this implements a sort of
binary search in the angular domain.

Speedup: about 9-10%
Results: -0.05% only on derfraw300.

Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
2013-07-02 14:07:19 -07:00
Deb Mukherjee
8d3d2b76f3 Tx size selection enhancements
(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
2013-07-02 13:54:00 -07:00
Ronald S. Bultje
3cc6eb7c00 Merge "Make get_coef_context() branchless." 2013-07-02 11:48:15 -07:00
Jingning Han
b91a1586a3 Calculate rd cost per transformed block
Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.

Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac
2013-07-02 09:58:46 -07:00
Paul Wilkins
b7cd01ed73 Revert "New motion threshold factor - speed feature."
This reverts commit 1377278180.
Also fixes a spelling mistake.

Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f
2013-07-02 15:06:40 +01:00
Ronald S. Bultje
26b6318de8 Make get_coef_context() branchless.
This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.

Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.

Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
2013-07-01 16:34:10 -07:00
Yaowu Xu
ba3b2604f0 Merge "Quantize (64-bit only, for now) SSSE3 SIMD." 2013-07-01 15:58:57 -07:00
Ronald S. Bultje
7353ceab9d Quantize (64-bit only, for now) SSSE3 SIMD.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-07-01 11:36:07 -07:00
Paul Wilkins
1377278180 New motion threshold factor - speed feature.
Added a speed feature that focuses only on thresholds
for new motion modes.

Moved sf->comp_inter_joint_search_thresh into speed
1.  This has ~+0.4% impact on quality at speed 0 as
our quality reference baseline.

Slight adjustment to baseline thresholds.

Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
2013-07-01 12:11:21 +01:00
Ronald S. Bultje
bc70c60b25 Merge "fixed a bug where sse is not populated" 2013-06-29 07:42:41 -07:00
Yaowu Xu
f853e662b7 fixed a bug where sse is not populated
Change-Id: I692d800af1f976c84a76f8bd66864c4b39540abc
2013-06-28 17:10:22 -07:00
Ronald S. Bultje
d00b8e5f82 Inline vp9_get_coef_context() (and remove vp9_ prefix).
Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles

Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.

Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
2013-06-28 10:40:21 -07:00
Ronald S. Bultje
e3ce2b2ab3 Minor change to prevent one level of dereference in cost_coeffs().
4x4: 234 -> 236 cycles
8x8: 878 -> 888 cycles
16x16: 3664 -> 3550 cycles
32x32: 18134 -> 17392 cycles

Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78
2013-06-28 10:29:07 -07:00
Ronald S. Bultje
91d223bd5c Some minor optimizations for cost_coeffs().
Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
4x4: 298 -> 234 cycles
8x8: 1227 -> 878 cycles
16x16: 23426 -> 18134 cycles
32x32: 4906 -> 3664 cycles

Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.

Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
2013-06-28 10:29:02 -07:00
Ronald S. Bultje
af660715c0 Make coefficient skip condition an explicit RD choice.
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.

The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.

Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.

Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
2013-06-28 10:28:49 -07:00
Yaowu Xu
8b9eea0a34 Minor cleanups
Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12
2013-06-28 09:19:50 -07:00
Paul Wilkins
05ffdf2625 Merge "Auto adapt step size feature." 2013-06-27 02:28:41 -07:00
Paul Wilkins
59af9049d3 Merge "Start adaptive threshold for each mode at max." 2013-06-27 02:28:36 -07:00
Paul Wilkins
5bcf069c6b Merge "Change meaning of cpi->sf.first_step and rename." 2013-06-27 02:28:21 -07:00
Jingning Han
fc1cfd8e32 Merge "Make intra predictor reference buffer configurable" 2013-06-26 19:02:02 -07:00
Jingning Han
861cb06c67 Make intra predictor reference buffer configurable
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.

Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
2013-06-26 17:17:21 -07:00
Paul Wilkins
9f3ab83486 Auto adapt step size feature.
Also tweaks to other features and experiments with
what is on and off at different speed settings.

Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
2013-06-26 19:48:39 +01:00
Dmitry Kovalev
49dee16879 Merge "Using get_plane_block_{width, height} instead of custom code." 2013-06-26 10:23:27 -07:00
Paul Wilkins
689957e3ad Start adaptive threshold for each mode at max.
Each frame we reset all adaptive thresholds to MAX
rather than base. As modes are picked their thresholds
drop down.

Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8
2013-06-26 17:04:47 +01:00
Paul Wilkins
e606cac046 Change meaning of cpi->sf.first_step and rename.
Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
and changed its meaning such that it is a delta applied to
reduce the default first step size (>> x) in the motion search
rather than an absolute value.

The default first step size is already changed according to the image
dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
now applies a further correction from the default.

Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d
2013-06-26 17:04:06 +01:00
Jingning Han
d19ea3861d Refactor intra predictor block
Remove vp9_intra4x4_predict(). Use the common intra prediction
function for all block sizes.

Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560
2013-06-25 16:33:13 -07:00
Dmitry Kovalev
dc0f457c94 Using get_plane_block_{width, height} instead of custom code.
Change-Id: I453ed11b965e857a14c18ea5c0f4a0a48e7dc0d9
2013-06-25 14:11:18 -07:00
Dmitry Kovalev
87ee34aacb Removing unused code.
Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
functions.

Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
2013-06-25 10:17:19 -07:00
Dmitry Kovalev
f27f76dfb3 Transforming scale_mv_component_q4 into scale_mv_q4 function.
Using MV instead of int_mv for function arguments.

Change-Id: Ic25e13dccbc98fac1fa1b3255127e00cca2a57f6
2013-06-21 15:34:29 -07:00
Ronald S. Bultje
54b2a59623 Implement SSE2 block_error.
Change vp9_block_error() to return a 64bit error variable, change all
callers to expect a 64bit return value (this will prevent overflows,
which we basically don't check for at all right now). Remove duplicate
block_error() function, which fixed that through truncation. Remove
old (incompatible) mmx/sse2 block_error SIMD versions and replace with
a new one that returns a 64bit value.

Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to
3min23, i.e. a 3% overall speedup.

Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
2013-06-21 12:54:52 -07:00
Yaowu Xu
ee07a261a0 rename variables to avoid build error in MSVC
Change-Id: I7960178c95c54d5c4497e44cfc8c493566294b34
2013-06-20 18:31:48 -07:00
Deb Mukherjee
7947a33d72 Improving model rd with variance and quant step
Improves the rd modeling function and implements them using interpolation
from a table which is a little faster. Also uses sse as input to the
modeling function rather than var - since there is no dc prediction
used and as a result the sse works a little better.

derfraw300: +0.05%
Speedup: ~1%

Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
2013-06-20 10:06:28 -07:00
Jim Bankoski
1f94b97694 convert all speed things to speed features
Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
2013-06-20 09:42:44 -07:00
Yaowu Xu
12180c8329 Remove unnecessary copying of probs.
Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c
2013-06-18 23:02:27 -07:00
Deb Mukherjee
4ad96115cd Some cleanups in rd motion search
No bitstream or output change - only cosmetics.

Change-Id: Ic8c1d7ad010a87dcf27d12a38cd7dd5adba683a7
2013-06-13 17:25:23 -07:00
Deb Mukherjee
f18328cbf1 Adds a zero check in model_rd function
Avoids divide-by-zero when variance is 0.

Change-Id: I3c7f526979046ff7d17714ce960fe81d6e1442a0
2013-06-10 17:04:47 -07:00
John Koleszar
717d744a01 Fix use of get_uv_tx_size in loopfilter
Change the argument of get_uv_tx_size() to be an MBMI pointer, so that the
correct column's MBMI can be passed to the function.

Change-Id: Ied6b8ec33b77cdd353119e8fd2d157811815fc98
2013-06-10 11:40:57 -07:00
Paul Wilkins
de6ec27d1a Rd check on segment level reference mode.
Do not allow the rd code to check compound modes if
a segment level reference frame is selected.

Change-Id: I95f0c57789e0eaceed7caf227e94b4ba3130a06c
2013-06-10 11:03:15 -07:00
Ronald S. Bultje
b12a8dac98 Allow non-zeromv if ref_frame=intra with segmentation skip/ref enabled.
Change-Id: Ib5a95bb6ab643b276df3faa9bf99595e4a69ff18
2013-06-10 10:55:10 -07:00
Tero Rintaluoma
86bb6df005 Fixed point reference picture scaling
Fixed point scaling factors are calculated once for each
reference frame by using integer division. Otherwise fixed point
scaling routines are used in all scaling calculations. This makes it
possible to calculate fixed point scaling factors on device driver
software and pass them to hardware and thus avoid division on hardware.

TODO:
 - Missing check for maximum frame dimensions
   (currently scaling uses 14 bits)
 - Missing check for maximum scaling ratio
   (upscaling 16:1, downscaling 2:1)

Problems:
 - Straightforward fixed point implementation can cause error +-1
   compared to integer division (i.e. in x_step_q4). Should only
   be an issue for frames larger than 16k.

Change-Id: I3cf4dabd610a4dc18da3bdb31ae244ebaf5d579c
2013-06-10 08:07:55 -07:00
Deb Mukherjee
21401942b0 Coding tx-size selection by use of spatial context
Adds coding of transform size within a frame by use of context
of transform sizes selected in left and above blocks.

Also incorporates code for generating stats.

TODO: generate and incorporate new default stats

Change-Id: I6a7af099f6ad61d448521d9a51167aedaf638ed6
2013-06-07 16:07:58 -07:00
Paul Wilkins
340c7a48e6 Change to segment ref frame feature.
Simplify feature to only support a single reference frame
instead of a mask.

Change-Id: I5dd3a98c7a224aafb35708850ab82e2f220e68fb
2013-06-07 21:42:22 +01:00
Deb Mukherjee
3ee1a21a42 Coding updates for tx-size selection
Changes to the coding of transform sizes, along with forward
and backward probability updates.

Results:
derf300: +0.241%

Context based coding of transform sizes will be in a separate
patch.

Change-Id: I97241d60a926f014fee2de21fa4446ca56495756
2013-06-07 08:54:00 -07:00
Ronald S. Bultje
6ef805eb9d Change ref frame coding.
Code intra/inter, then comp/single, then the ref frame selection.
Use contextualization for all steps. Don't code two past frames
in comp pred mode.

Change-Id: I4639a78cd5cccb283023265dbcc07898c3e7cf95
2013-06-06 17:28:09 -07:00
Ronald S. Bultje
ad34368786 New intra mode and partitioning probabilities.
Split partition probabilities between keyframes and non-keyframes,
since they are fairly different. Also have per-blocksize interframe
y intramode probabilities, since these vary heavily between different
blocksizes.

Lastly, replace default probabilities for partitioning and intra modes
with new ones generated from current codec. Replace counts with actual
probabilities also.

Change-Id: I77ca996e25e4a28e03bdbc542f27a3e64ca1234f
2013-06-06 10:45:30 -07:00
Jingning Han
d03e974fbd Bug fix in rd_pick_inter_mode_sb_
Fix the calculation of step size in height.

Change-Id: I0e0c0175f141f5a41214ae51cef233d13942d3c5
2013-06-06 10:04:26 -07:00
Paul Wilkins
26e24b1dd7 Merge "Rd thresholds change with block size." into experimental 2013-06-06 09:27:44 -07:00
Paul Wilkins
02590a5b1b Merge "Turn off compound inter search refinement for good quality." into experimental 2013-06-06 09:27:31 -07:00
Jim Bankoski
b4c4f64862 signs reverted
Change-Id: Ieface458c83eb6e7ee95595d9fc662f372117c9a
2013-06-06 08:59:22 -07:00
Paul Wilkins
c3316c2bc5 Rd thresholds change with block size.
Added structures to support independent rd thresholds
for different block sizes (and set experimental block
size correction factors).

Added structure to to allow dynamic adaptation of thresholds
per mode and per block size basis depending on how often
the mode/block size combination is seen (currently fixed factor).

Removed some unused variables.

TODO
- Adaptation of thresholds based on how often each mode chosen.
- The baseline mode values could also be adjusted based on
  the block size (e.g. for a particular intra mode use a low threshold
  for 4x4 prediction blocks but a relatively high value for 64x64.

Change-Id: Iddee65ff3324ee309815ae7c1c5a8584720e7568
2013-06-06 15:45:53 +01:00
Paul Wilkins
c880e02f97 Turn off compound inter search refinement for good quality.
Turn this feature off for some modes in  "good" quality.

Change-Id: I3f262d62cca8f01736b977af1465291e8be29f0a
2013-06-06 15:44:25 +01:00
Jim Bankoski
5a88271b09 don't tokenize & encode tokens for blocks in UMV
This avoids encoding tokens for blocks that are entirely
in the UMV border. This changes the bitstream.

Change-Id: I32b4df46ac8a990d0c37cee92fd34f8ddd4fb6c9
2013-06-06 06:10:25 -07:00
Jingning Han
61e6586230 Merge "Fix UV intra coding rd loop" into experimental 2013-06-05 21:47:00 -07:00
Jingning Han
f04b15486a Fix UV intra coding rd loop
This commit makes the coding/reconstruction operations of intra
coding rate-distortion loop for UV components consistent with those
of the encoding process.

key frame coding gains:
derf:   0.11%
stdhd:  0.42%

Change-Id: I8d49f83924a320e3689ef2d60096c49d7f0c7a40
2013-06-05 21:18:02 -07:00
Deb Mukherjee
30226a658f Cosmetic renaming VP9_MVREFS to VP9_INTER_MODES
NO bitstream change

Change-Id: I79f6146dac5fdd157051b6f8dc611c0b7b5e5f7f
2013-06-05 11:24:01 -07:00
Jingning Han
513d326d75 Merge "Make sb intra rd search consistent with encoding" into experimental 2013-06-04 14:59:05 -07:00
Jingning Han
51b6e73a68 Make sb intra rd search consistent with encoding
This commit makes operations of the superblock intra coding rate
distortion optimization consistent with those used in the encoding
process. Given the test prediction mode and transform size, the rd
optimizer encodes and reconstructs each transformed block of the
superblock consecutively, then computes the total rate-distortion
costs accosicated with the current superblock to select the coding
decisions.

It achieves coding performance gains:
derf 0.353%
yt   1.111%

Change-Id: I0da2eb7a71361dfb8c1384927fc536b0c2790d07
2013-06-04 13:54:48 -07:00
Dmitry Kovalev
6a961e7dc8 Merge "Replacing memcpy with struct assignment." into experimental 2013-06-03 14:32:05 -07:00
Jingning Han
9068bce4e7 Put iterative motion search under speed control
Enable iterative motion search for compound inter-inter prediction
of block sizes 4x4/4x8/8x4 only when best coding quality is selected.
The iterative motion search provides about 0.1% gains for derf and
stdhd at this point, at the expense of longer runtime.

Change-Id: Idc03e7f827e51f1bb8d269bc3752ee297a6bbfe5
2013-06-03 09:18:57 -07:00
Dmitry Kovalev
3b9ec31eaf Replacing memcpy with struct assignment.
Change-Id: Ib557cc6351404b9e178e95a545883eb3666f11f0
2013-05-31 16:00:32 -07:00
Dmitry Kovalev
317d832d38 Merge "Adding plane_block_width and plane_block_height functions." into experimental 2013-05-31 15:28:45 -07:00
Deb Mukherjee
0048ec2329 Costing fixes related to trellis optimization
Migrates costing changes/fixes from the rebalance expt to the head
without the expt on.

Rebased.

Change-Id: I51677d62f77ed08aca8d21a4c9a13103eb8de93f
Results:
derfraw300: +0.126%
2013-05-31 13:56:32 -07:00
Dmitry Kovalev
120a878199 Adding plane_block_width and plane_block_height functions.
Change-Id: I02c17fb733c0f3c22dc3167c3d3182797415f1ae
2013-05-31 12:31:49 -07:00
Ronald S. Bultje
a288cb3b10 Merge "Merge all various transform size data trackers into single variables." into experimental 2013-05-31 09:59:24 -07:00
Scott LaVarnway
1e025dbfd1 Merge "Moved use_prev_in_find_mv_refs check to frame level" into experimental 2013-05-31 09:35:51 -07:00
Ronald S. Bultje
e9d68a5e36 Merge all various transform size data trackers into single variables.
Change-Id: I2dfc569106b29fbe4da20585a0e85e5e9ea6a4db
2013-05-31 09:18:59 -07:00
Jim Bankoski
21595f8e38 Merge "Creates a new speed 1:" into experimental 2013-05-30 20:36:05 -07:00
Jim Bankoski
ced21bd6a6 Creates a new speed 1:
This speed 1 - uses variance threshold stolen from static-thresh
to determine split.  Any superblock with greater than the variance
set by static thresh * quantizer index squared is split. In addition
transform size is set to largest size less than or equal to partition
size, sub pixel filter is set to normal,  and only 12 modes are used
at all.

Change-Id: If7a2858ee70f96d1eb989c04fd87a332b147abef
2013-05-30 19:53:00 -07:00
Ronald S. Bultje
16482bddf7 Merge "Remove splitmv." into experimental 2013-05-30 19:07:12 -07:00
Ronald S. Bultje
d2205f92c3 Merge changes I98c18fe5,I80c37cff into experimental
* changes:
  Remove i4x4_pred.
  Remove unused table.
2013-05-30 19:06:44 -07:00
Ronald S. Bultje
e6485581fe Remove splitmv.
We leave it in rdopt.c as a local define for now - this can be removed
later. In all other places, we remove it, thereby slightly decreasing
the size of some arrays in the bitstream.

Change-Id: Ic2a9beb97a4eda0b086f62c039d994b192f99ca5
2013-05-30 17:21:01 -07:00
Ronald S. Bultje
1efa79d32f Remove i4x4_pred.
It remains as a local define in rdopt.c so we can distinguish between
split and non-split modes in the RD loop, but disappears outside that
scope in the codec.

Change-Id: I98c18fe5ab7e4fbd1d6620ec5695e2ea20513ce9
2013-05-30 16:44:58 -07:00
Ronald S. Bultje
f5827699bf Merge "Merge all intra mode coding trees into a single one." into experimental 2013-05-30 11:27:51 -07:00
Jingning Han
5e97862a71 Merge "Enable iterative motion search for 4x4 inter pred" into experimental 2013-05-30 11:02:10 -07:00
Ronald S. Bultje
98c192ae83 Merge all intra mode coding trees into a single one.
Also merge all counters. This removes a few unused probability updates
from the bitstream.

Change-Id: I20f58853e9dac84d8c0d9703ae012c55917516eb
2013-05-30 09:58:53 -07:00
Jim Bankoski
e987f03acd Merge "valgrind - txfm_thresh not set" into experimental 2013-05-30 09:34:48 -07:00
Deb Mukherjee
c98bfcfbbb Merge "Balancing coef-tree to reduce bool decodes" into experimental 2013-05-30 08:10:47 -07:00
Jim Bankoski
ecf023f6e4 Merge "fix valgrind warning" into experimental 2013-05-30 08:04:49 -07:00
Jingning Han
87626a8f6e Enable iterative motion search for 4x4 inter pred
This commit enables iterative motion search for 4x4/4x8/8x4 block
size compound inter-inter prediction.

WIP: borg run testing

Change-Id: I2b318db4a03cdca5a8002b3fa6c0fa89b129288b
2013-05-30 10:49:35 +01:00
Ronald S. Bultje
17544d1478 Merge "Remove some unused code related to macroblock/splitmv coding." into experimental 2013-05-29 17:35:05 -07:00
Jingning Han
5c05fbf6bb Merge "Refactor 4x4 block level rd loop" into experimental 2013-05-29 16:35:02 -07:00
Deb Mukherjee
b8b3f1a46d Balancing coef-tree to reduce bool decodes
This patch changes the coefficient tree to move the EOB to below
the ZERO node in order to save number of bool decodes.

The advantages of moving EOB one step down as opposed to two steps down
in the other parallel patch are: 1. The coef modeling based on
the One-node becomes independent of the tree structure above it, and
2. Fewer conext/counter increases are needed.

The drawback is that the potential savings in bool decodes will be
less, but assuming that 0s are much more predominant than 1's the
potential savings is still likely to be substantial.

Results on derf300: -0.237%

Change-Id: Ie784be13dc98291306b338e8228703a4c2ea2242
2013-05-29 16:25:52 -07:00
Jim Bankoski
aae78c8ac7 valgrind - txfm_thresh not set
For 4x4 blocks valgrind points out the cache was uninitalized.
This resolves the issue by setting it.

Change-Id: I22733000da048643762813a84fbda66d8e4040d2
2013-05-29 13:56:08 -07:00
Jingning Han
d0a3872019 Refactor 4x4 block level rd loop
This commit makes clean-ups in the rate-distortion loop for 4x4,
4x8, and 8x4 block sizes for the use of iterative motion search.

Removed unnecessary use of bmi in handle_inter_mode.

Deprecated loop over labels in the 4x4/4x8/8x4 block rd search.

Change-Id: I71203dbb68b65e66f073b37abd90d82ef5ae6826
2013-05-29 13:44:52 -07:00
Scott LaVarnway
353642bc53 Moved use_prev_in_find_mv_refs check to frame level
This patch checks at the frame level to see if the previous
mode info context can be used.  This patch eliminates the
flag check that was done for every mode and removes another
check that was done prior to every vp9_find_mv_refs().

Change-Id: I9da5e18b7e7e28f8b1f90d527cad087073df2d73
2013-05-29 16:42:23 -04:00
Jim Bankoski
5e5470b254 fix valgrind warning
scales for second reference frame vars are unitialized if the
second ref frame is one of of those disallowed by refframeflags

Change-Id: I4ce42de391178c1699dcaede18c5f12c84993c61
2013-05-29 12:34:10 -07:00
Jingning Han
84deeddbaf Merge "Refactor rd loop for inter modes" into experimental 2013-05-29 10:55:23 -07:00
Jingning Han
6c97bba403 Merge "further clean-ups on intra4x4 coding" into experimental 2013-05-29 10:55:14 -07:00
Sami Pietila
88a4d4c510 Residual coding to cache energy class of tokens.
Proposal for tuning the residual coding by changing how the context
from previous tokens is calculated. Storing the energy class of previous
tokens instead of the token itself eases the critical path of
HW implementations.

Change-Id: I6d71d856b84518f6c88de771ddd818436f794bab
2013-05-29 15:21:01 +01:00
Ronald S. Bultje
4487f5a690 Remove some unused code related to macroblock/splitmv coding.
Change-Id: Ic40d56fb162f4e201547dfae33e62ccd9e865889
2013-05-29 06:29:56 -07:00
Jingning Han
94d700e763 Refactor rd loop for inter modes
This commit pulls the iterative motion search for compound inter-
inter out from handle_inter_mode_ as a separate function. Hence,
it is applicable to 4x4/4x8/8x4 level compound inter search to be
enabled later.

Also edit the rd loop for 4x4 inter block sizes for cosmetic
purpose.

Change-Id: Ibc71a11cbe5a26cd52faba01026cf8446cf4d2b4
2013-05-28 16:31:33 -07:00
Jingning Han
4729a6f389 further clean-ups on intra4x4 coding
Removed one 4x4 prediction step that was unnessary in the rd loop.
Removed a unused modecosts estimate from encoder side.

Change-Id: I65221a52719d6876492996955ef04142d2752d86
2013-05-28 11:19:05 -07:00
Yaowu Xu
601bab4fde Merge "a few clean-ups" into experimental 2013-05-27 15:16:21 -07:00
Ronald S. Bultje
cba8e16e93 Decrease scope of frame_mv argument to handle_inter_mode().
Change-Id: I81c637c61ecc33cb66beb59a2a33166d66b9a0a2
2013-05-27 14:16:45 -07:00
Yaowu Xu
2b96ffe025 a few clean-ups
1. remove prediction mode conversion
2. unified bmode, same for key and non-key frame
3. set I4X4_PRED count for pdf to 0, as I4X4_PRED is no longer
coded ever. It is determined by ref_frame and block partition

Change-Id: If5b282957c24339b241acdb9f2afef85658fe47d
2013-05-27 13:53:56 -07:00
Ronald S. Bultje
f188bf1c3d Remove unused mode_index argument from handle_inter_mode().
Change-Id: I07b8c15f33e6e7c63dd0033c18c4ac5c0303cf32
2013-05-27 08:49:17 -07:00
Ronald S. Bultje
5cac66078e Remove splitmv.
Also do per-partition motion vector referencing in <sb8x8 partitions,
and adjust mvref finding for sub8x8 partitions.

Change-Id: Id3ed1ed4d2a8910d11d327db6cc63b8eb79f941f
2013-05-26 14:40:49 -07:00
Jingning Han
826efc838c Fix a bug in intra4x4 level rd loop
This commit fixed a uninitialized value use in the intra 4x4/8x4/4x8
rate-distortion loop.

Change-Id: I5c25b3536b59e4f5fbb23cf85baf93b2ccec7d72
2013-05-23 17:44:33 -07:00
Jingning Han
ae10319520 Make comp_inter_inter support 4x4 partition coding
This commit refactors the iterative motion search for compound
inter-inter mode, to make it support all partition types including
4x4/4x8/8x4 block sizes.

Change-Id: I5f1212b0f307377291763e45c6bdc9693b5f04c8
2013-05-23 13:13:42 +01:00
Paul Wilkins
33ecd6ad54 Merge Scatter Scan experiment.
Removal from under configure flag.
A bit  renaming

Change-Id: I2213229dfe852001dfec16b149f47c52ce88f3aa
2013-05-23 13:09:27 +01:00
Jingning Han
7ac5ac52f9 Merge 4x4 block level partition into codebase
Move 4x4/4x8/8x4 partition coding out of experimental list.

This commit fixed the unit test failure issues. It also resolved
the merge conflicts between 4x4 block level partition and iterative
motion search for comp_inter_inter.

Change-Id: I898671f0631f5ddc4f5cc68d4c62ead7de9c5a58
2013-05-23 11:58:50 +01:00
Deb Mukherjee
ddb2309568 Merge "Using 128 entry look up table for coef models" into experimental 2013-05-22 10:38:35 -07:00
Jingning Han
d2cacdc530 Merge "Make the intra rd search support 8x4/4x8" into experimental 2013-05-22 10:00:15 -07:00
Deb Mukherjee
de4d682ca4 Using 128 entry look up table for coef models
Reverts to using 128 bit LUT for the coef models rather than 48
to ease hardware implementation.

Also incorporates some cleanups including removing various
hooks to support different lookup tables based on block_type and
ref_type.

Change-Id: I54100c120cca07a2ebd3a7776bc4630fa6a153f6
2013-05-22 08:44:31 -07:00
Paul Wilkins
0b713f8c18 Merge CONFIG_COMP_INTER_JOINT_SEARCH.
Merge this experiment so that it is under a speed feature
flag not a configuration flag.

Change-Id: I536f7f125a4ff5149bb3a64f791e835c324535fd
2013-05-22 11:23:31 +01:00
Jingning Han
f153a5d063 Make the intra rd search support 8x4/4x8
This commit allows the rate-distortion optimization of intra coding
capable of supporting 8x4 and 4x8 partition settings.

It enables the entropy coding of intra modes in key frame using a
unified contextual probability model conditioned on its above/left
prediction modes.

Coding performance:
derf 0.464%

Change-Id: Ieed055084e11fcb64d5d5faeb0e706d30268ba18
2013-05-21 21:03:00 -07:00
John Koleszar
ddf13be8ef Merge "Initial version of alpha channel support" into experimental 2013-05-21 17:29:51 -07:00
Deb Mukherjee
7a645e4e12 Merging the model coef prob experiment
Merges the experiment.

Change-Id: I4eb19af6de6df6aa3a96a2e82f231d47ed9b3ae9
2013-05-21 14:44:38 -07:00
Scott LaVarnway
1db6373267 Merge "WIP: 4x4 idct/recon merge" into experimental 2013-05-21 10:45:53 -07:00
Dmitry Kovalev
4ac70bd7d3 Adding get_ref_frame_idx function.
Change-Id: I4f1a4eca6794cda78d00512196caacd5567e2dcc
2013-05-20 16:09:00 -07:00
Deb Mukherjee
39a90bc8e8 Updating the model coef experiment
Cleans up the experiment. Actually uses reduced counts for backward
updates, and reduced number of probabilities in the context.

No change in bitstream when the experiment is on.

Between expt on and off:
derfraw300 is down only -0.062% (which is better than when expts
were run previously).

Change-Id: I55285a049a0c22810bdb42914212ab5a4f8521b5
2013-05-20 12:46:36 -07:00
Scott LaVarnway
ba48a11130 WIP: 4x4 idct/recon merge
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.

Change-Id: I296604bf73579c45105de0dd1adbcc91bcc53c22
2013-05-20 13:03:17 -04:00
Jingning Han
810b612c23 Enable bit-stream support to 8x4 and 4x8 partition
The recursive partition type search is enabled down to 4x4, 4x8 and
8x4, followed by the corresponding rate-distortion optimization for
the per-partition encoding mode decisions.

The bit-stream writing/reading synchronized in supporting the
rectangular partition of 8x8 block.

This provides above 1% coding performance gains on derf.

To do next:
1. re-design the rate-distortion loop for inter prediction below 8x8.
2. re-design the rate-distortion loop for intra prediction below 4x4.
3. make the loop-filter aware of rectangular partition of 8x8 block.
4. clean the unused probability models.
5. update default probability values.

Change-Id: Idd41a315b16879db08f045a322241f46f1d53f20
2013-05-19 14:59:04 -07:00
John Koleszar
679e4abdd5 Initial version of alpha channel support
This is a mostly-working implementation of an extra channel in the
bitstream. Configure with --enable-alpha to test. Notable TODOs:

 - Add extra channel to all mismatch tests, PSNR, SSIM, etc
 - Configurable subsampling
 - Variable number of planes (currently always uses all 4)
 - Loop filtering
 - Per-plane lossless quantizer
 - ARNR support

This implementation just uses the same contents as the Y channel
for the A channel, due to lack of content and general pain in
playing back 4 channel content. A later patch will use the actual
alpha channel passed in from outside the codec.

Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592
2013-05-16 22:21:09 -07:00
Jingning Han
8e3d0e4d7d Add building blocks for 4x8/8x4 rd search
These building blocks enable rate-distortion optimization search
over block sizes of 8x4 and 4x8. Need to convert them into mmx/sse
forms.

Change-Id: I570ea2d22d14ceec3fe3575128d7dfa172a577de
2013-05-16 10:41:29 -07:00
Jingning Han
8468a5c1a0 Fix the transform type selection in 4x4 partition
This commit allows proper transform type (DCT/ADST) selection in
the settings of partition 4x4 level.

Change-Id: Iec6f922a46480d777e7ca9142a99e8c131f0077b
2013-05-15 16:09:58 -07:00
Jingning Han
1f26840fbf Enable recursive partition down to 4x4
This commit allows the rate-distortion optimization recursion
at encoder to go down to 4x4 block size. It deprecates the use
of I4X4_PRED and SPLITMV syntax elements from bit-stream
writing/reading. Will remove the unused probability models in
the next patch.

The partition type search and bit-stream are now capable of
supporting the rectangular partition of 8x8 block, i.e., 8x4
and 4x8. Need to revise the rate-distortion parts to get these
two partition tested in the rd loop.

Change-Id: I0dfe3b90a1507ad6138db10cc58e6e237a06a9d6
2013-05-14 12:39:56 -07:00
Yunqing Wang
dee12bdf8f Merge "Do joint motion search iteratively" into experimental 2013-05-14 10:18:11 -07:00
Yunqing Wang
60456083e9 Do joint motion search iteratively
Allow motion search multiple times iteratively, and break out
the loop if this search couldn't find better motion vectors.
Limit the maximum number of search to 2.

Tests results:
1. stdhd set: 0.311%(overall psnr); 0.346%(ssim).
positive gain on 10 out of 16 clips(best: 2.746% on sunflower;
worst: -0.434% on old_town_cross).
2. derf set: 0.016%(overall psnr); 0.062%(ssim).
positive gain on half of the clips(best: 0.499% on bowing;
worst: -0.387 on city).

Change-Id: Ibf0a51776d4caf7707be0586346db08128117559
2013-05-13 12:14:09 -07:00
Jingning Han
e996c9c5f1 Merge "Force bsize for UV in I4X4 and SPLITMV" into experimental 2013-05-13 10:51:39 -07:00
Paul Wilkins
e5f715201a Change to band calculation.
Change band calculation back to simpler model based
on the order in which coefficients are coded in scan order
not the absolute coefficient positions.

With the scatter scan experiment enabled the results were
appear broadly neutral on derf (-0.028) but up a little on std-hd +0.134).

Without the scatterscan experiment on the results were up derf as well.

Change-Id: Ie9ef03ce42a6b24b849a4bebe950d4a5dffa6791
2013-05-13 17:21:49 +01:00
Jingning Han
4c2c350309 Force bsize for UV in I4X4 and SPLITMV
Use 4x4 block coding for UV components arbitrarily in I4X4_PRED and
SPLITMV coding modes. This is a temporary solution to enable
bit-stream support for recursive partition down to 4x4 block size.
Will separate the functionalities of 4x4 block coding rate-distortion
out from those of superblocks.

Change-Id: I03dc15d5897014f175f3f2c91e9b266091d56797
2013-05-11 13:39:16 -07:00
Yunqing Wang
9755d9fda2 Remove unused mdcounts
mdcounts seems no longer used.

Change-Id: Idd8162e8acfa3f5be7a18767156cc79ccbc2bdee
2013-05-10 11:02:22 -07:00
Yunqing Wang
9f5811c2da Add joint motion search in comp_inter_inter mode(experiment)
In current code, motion vectors got from single prediction mode are used
in compound prediction mode directly. These motion vectors may not give
accurate prediction since they are searched independently. In this patch,
we took Pascal's suggestion, and did joint motion search in compound
prediction mode to find better motion vectors in this situation.
Test results:
Overall PSNR: 0.570%(derf), 0.918%(stdhd);
SSIM: 0.572%(derf), 1.009%(stdhd);

The encoder is a little slower. This can be improved since some c
code is used in motion search.

Change-Id: Ib30c9240f6c56c9b070867b4ca89412a76d9f3c6
2013-05-10 10:15:43 -07:00
Dmitry Kovalev
f0911886f3 Merge "Renaming 'Speed' to 'speed' inside VP9_COMP struct." into experimental 2013-05-08 16:35:35 -07:00
Dmitry Kovalev
8f4e9ac8bc Removing y_to_uv_block_size and y_bsizet_to_block_size functions.
Change-Id: I49527ff8dd8bef1074c18a964fed2a575f0b118a
2013-05-08 15:23:42 -07:00
Dmitry Kovalev
4be190d9d0 Renaming 'Speed' to 'speed' inside VP9_COMP struct.
Change-Id: I4374b5af40ee9082ddf7956a9756a15ad9ad5436
2013-05-08 14:35:42 -07:00
John Koleszar
14a5c7285b Make switchable filter search subsampling-aware
Makes the temporary storage of the filtered data agnostic to
the number of planes and how they're subsampled.

Change-Id: I12f352cd69a47ebe1ac622af30db29b49becb7f4
2013-05-07 21:57:00 -07:00
John Koleszar
7465f52f81 Merge "Make setup_pred_block subsampling-aware." into experimental 2013-05-07 21:53:31 -07:00
Dmitry Kovalev
80997b3aa2 Merge "Adding get_switchable_rate function." into experimental 2013-05-07 17:10:48 -07:00
Paul Wilkins
a14ae84749 Deprecate code_zerogroup experiment.
Delete code under the CONFIG_CODE_ZEROGROUP flag.

Change-Id: I5fe6c7b42a5da9b73118e33594301da4129f320a
2013-05-07 16:52:55 -07:00
Dmitry Kovalev
455816231e Adding get_switchable_rate function.
Change-Id: I71311a14f8d7f48508b250f25d1d0914c6a1ac72
2013-05-07 16:52:04 -07:00
Paul Wilkins
1ed57a6a62 Deprecate comp_interintra_pred experiment.
Delete code under the CONFIG_COMP_INTERINTRA_PRED
flag.

Change-Id: I3d1079cf46305c08f7e11d738596ea112e7b547f
2013-05-07 16:24:08 -07:00
Paul Wilkins
8c1b516d10 Deprecate the newbintramode experiment.
Clean out code relating to newbintramode.

Change-Id: Ie91f4f156cdf60ce0da8ca407c1c9cb00c7d0705
2013-05-07 16:00:59 -07:00
Jingning Han
cf8b5a09ed Add building blocks for partition down to 4x4
Macro ab4x4 contains experiments for recursive partition down to
4x4 block size.

Change-Id: Ic727842fa98a4df9fd51e0025a545dc76a5c76c1
2013-05-07 12:11:51 -07:00
John Koleszar
e559e14fa6 Make setup_pred_block subsampling-aware.
Code previously set up the pointers by scaling by MI_UV_SIZE, which
is 4:2:0 only.

Change-Id: Ic13a92895cff018ec1345736746ed84cb31e6e31
2013-05-07 11:47:45 -07:00
Jingning Han
776c1482a3 Merge SB8X8 into the codebase
Pull sb8x8 out of experimental list. verified via borg run tests.
Fixed unit test failures.

Change-Id: I12a4bbd17395930580c048ab68becad1ffe46e76
2013-05-07 09:08:25 -07:00
Dmitry Kovalev
2e5f0084f3 Adding model_rd_for_sb function.
Iterating over all planes in the loop instead of custom y,uv code inside
handle_inter_mode function.

Change-Id: I301f9276d6d544c2fd7203d84f1318ac80ea625d
2013-05-06 12:42:53 -07:00
Jingning Han
8e1c97cf73 Fix a unit test failure of sb8x8 on scaling ref
Disable the use of scaled reference frame for motion search in
SPLITMV mode. This fixes the unit test failure issue triggered
when merging sb8x8 from experimental list.

Change-Id: I02ac25fd8db8d5762f8fee29513b947189875fa0
2013-05-06 10:28:18 -07:00
Ronald S. Bultje
f7fa367094 Fix first-pass intra4x4 for sb8x8 experiment.
Change-Id: I1df17f45721c690d157800daa6a0b377e3d32bc2
2013-05-04 15:49:41 -07:00
Ronald S. Bultje
842c573e04 Merge "Fix overflow in RD error calculation code." into experimental 2013-05-03 18:03:06 -07:00
John Koleszar
6c622e2783 Merge "Separate transform and quant from vp9_encode_sb" into experimental 2013-05-03 17:19:01 -07:00
John Koleszar
4529c68b3b Separate transform and quant from vp9_encode_sb
This allows removing a large number of transform size specific functions,
as well as supporting 444/alpha by routing all code through the
subsampling-aware path.

Change-Id: Ieb085cebe9f37f24fc24de179898b22abfda08a4
2013-05-03 12:14:50 -07:00
Ronald S. Bultje
ee808e52bd Fix overflow in RD error calculation code.
Change-Id: I61ef1f198c876f9f79787ea7d7385a862cfbae19
2013-05-03 10:33:07 -07:00
Dmitry Kovalev
7ab2d7bf55 Removing MAXF macro and using MAX instead.
Change-Id: I51c53692b1150005645bf362c5e5a8275178a8fd
2013-05-02 11:57:16 -07:00
Ronald S. Bultje
f37d8400db Store splitmv modes in context after 8x8 rd loop.
Change-Id: I07aa89a67e0ac5f99ef0c448553dbc46b0ed27f2
2013-05-01 17:13:23 -07:00
Ronald S. Bultje
b6c2d872f0 Fix some crashes in sb8x8 experiment.
Change-Id: I390bb1cedc835f439fd5dd6cda6572b29cbb139c
2013-05-01 14:45:27 -07:00
Dmitry Kovalev
79590f186c Merge "Cleaning up encoder segmentation code." into experimental 2013-04-30 17:49:55 -07:00
Ronald S. Bultje
d068d869b9 sb8x8 integration in rd loop.
Work-in-progress, not yet ready for review. TODO items:
- bitstream writing (encoder) and reading (decoder)
- decoder reconstruction

Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081
2013-04-30 16:13:20 -07:00
Dmitry Kovalev
51a73fbba2 Merge "Consistent names for quant-related functions and variables." into experimental 2013-04-30 10:19:48 -07:00
Dmitry Kovalev
ee97da2c03 Cleaning up encoder segmentation code.
Moving code from vp9_pack_bitstream to new function encode_segmentation.

Change-Id: I1f1e59a1f038618ad95162b7db4b6f8164850ea8
2013-04-29 16:07:17 -07:00
Ronald S. Bultje
2dbaa4f4f4 Change above/left_context to use an 8x8 basis.
Output changes slightly because of a minor bug in (at least) the sb32x16
block2above tx16x16 tables that previously existed in vp9_blockd.c.

Change-Id: I624af28ac200a8322d64454cf05c79e9502968cc
2013-04-29 10:37:25 -07:00
Dmitry Kovalev
5a5a1f25a8 Consistent names for quant-related functions and variables.
Change-Id: I3a6d601e90e8740b9c26dd0afbfe9d467b75d367
2013-04-26 12:30:20 -07:00
Ronald S. Bultje
1a46b30ebe Grow MODE_INFO array to use an 8x8 basis.
Change-Id: I087e08e7909a406b71715b8525c104208daa6889
2013-04-26 11:57:17 -07:00
John Koleszar
bb41ab4a0c Remove BLOCKD structure
All members can be referenced from their per-plane counterparts, and
removes assumptions about 24 blocks per macroblock.

Change-Id: I7ff2fa72d22c29163eb558981c8193765a8113d9
2013-04-26 10:35:54 -07:00
John Koleszar
4f55c5618a Remove destination pointers from BLOCKD
Access these members from MACROBLOCKD instead.

Change-Id: I7907230dd473ff12ebe182b9280d8b7f12a888c4
2013-04-26 10:14:07 -07:00
John Koleszar
4b27eb1f18 Merge "quantize: make 4x4, 8x8 common with larger transforms" into experimental 2013-04-26 09:08:49 -07:00
Scott LaVarnway
57f180b388 Removed bmi from blockd
This originally was "Removed update_blockd_bmi()".  Now,
this patch removed bmi from blockd and uses the bmi found
in mode_info_context.  Eliminates unnecessary bmi copies between
blockd and mode_info_context.

Change-Id: I287a4972974bb363f49e528daa9b2a2293f4bc76
2013-04-26 10:19:43 -04:00
John Koleszar
a672351af9 quantize: make 4x4, 8x8 common with larger transforms
There were 4 variants of the quantize loop in vp9_quantize.c, now
there is 1.

Change-Id: Ic853393411214b32d46a6ba53769413bd14e1cac
2013-04-25 14:44:54 -07:00
Ronald S. Bultje
18f29ff581 Remove duplicate code in RD handle_inter_mode() function.
Change-Id: I552d53f7e7331e9246d8a32d6c6dcc0cfa0cbeb0
2013-04-25 14:21:21 -07:00
Ronald S. Bultje
c849eaca59 Use b_width/height_log2 instead of mb_ where appropriate.
Basic assumption: when talking about transform units, use b_; when
talking about macroblock indices, use mb_.

Change-Id: Ifd163f595d4924ff892de4eb0401ccd56dc81884
2013-04-25 14:20:59 -07:00
John Koleszar
a99e1aa8ca Remove predictor pointers from BLOCKD
Access these members from MACROBLOCKD instead.

Change-Id: I2574622e577bb9feede47f6b7ccbb11f3e928ca8
2013-04-25 12:04:07 -07:00
John Koleszar
6c0c6b86c1 Remove diff from BLOCKD
The underlying storage for these buffers is in the per-plane MACROBLOCKD
area, so read it from there directly.

Change-Id: Id6bd835117fdd9dea07db95ad06eff9f12afaaf7
2013-04-25 11:57:22 -07:00
John Koleszar
15255eef82 Move dequant from BLOCKD to per-plane MACROBLOCKD
This data can vary per-plane, but not per-block.

Change-Id: I1971b0b2c2e697d2118e38b54ef446e52f63c65a
2013-04-25 11:57:20 -07:00
John Koleszar
4bd0f4f646 Remove BLOCK structure
All members can be referenced from their per-plane counterparts, and
removes assumptions about 24 blocks per macroblock.

Change-Id: I593fb0715e74cd84b48facd1c9b18c3ae1185d4b
2013-04-25 11:33:17 -07:00
Dmitry Kovalev
61a47da869 Adding is_inter_mode function.
Change-Id: I2d32d46002cb92c63050c2b8328865c406103621
2013-04-25 10:23:00 -07:00
Jingning Han
b0e3b3df18 Move sbsegment out of experimental list
Move rectangular superblock coding out of experimental list.

Change-Id: I96c37547d122330d666a67b4bf577ae54547857f
2013-04-24 15:19:17 -07:00
Jingning Han
ff2b8aa2c9 Contextual entropy coding of partition syntax
This commit enables selecting probability models for recursive block
partition information syntax, depending on its above/left partition
information, as well as the current block size. These conditional
probability models are reasonably stationary and consistent across
frames, hence the backward adaptive approach is used to maintain and
update the contextual models.

It achieves coding performance gains (on top of enabling rectangular
block sizes):
derf:   0.242%
yt:     0.391%
hd:     0.376%
stdhd:  0.645%

Change-Id: Ie513d9673337f0d27abd65fb566b711d0844ec2e
2013-04-24 14:23:14 -07:00
John Koleszar
bc30736f9b Merge "Remove coeff from BLOCK" into experimental 2013-04-23 17:42:12 -07:00
John Koleszar
aa6a36b062 Merge "Convert coeff to per-plane MACROBLOCK data" into experimental 2013-04-23 17:41:59 -07:00
John Koleszar
48f3e66e16 Remove coeff from BLOCK
Lookup the data per-plane from the MACROBLOCK struct.

Change-Id: I9253c4d3cf886aa9ab4aeab23a2156bfcf994ede
2013-04-23 16:39:21 -07:00
John Koleszar
138ec38cab Convert coeff to per-plane MACROBLOCK data
This commit moves the coeff storage from the MACROBLOCK struct to its
per-plane part. The next commit will remove the coeff member from the
BLOCK structure so that it is consistently accessed per-plane.

Also refactors vp9_sb_block_error_c and vp9_sb_uv_block_error_c to be
variable subsampling aware.

Change-Id: I18c30f87f27c3a012119b6c1970d5fa499804455
2013-04-23 16:28:17 -07:00
John Koleszar
4f35e3e1c1 Merge "Move src_diff to per-plane MACROBLOCK data" into experimental 2013-04-23 16:24:08 -07:00
Dmitry Kovalev
d0d1094a05 Merge "Adding get_scan_{4x4, 8x8, 16x16} functions." into experimental 2013-04-23 12:44:51 -07:00
John Koleszar
cbd1315ac4 Move src_diff to per-plane MACROBLOCK data
First in a series of commits making certain MACROBLOCK members
addressable per-plane. This commit also refactors the block subtraction
functions vp9_subtract_b, vp9_subtract_sby_c, etc to be
loops-over-planes and variable subsampling aware.

Change-Id: I371d092b914ae0a495dfd852ea1a3d2467be6ec3
2013-04-23 12:18:51 -07:00
Deb Mukherjee
611b26bbe0 Merge "Removing the implicit compound inter experiment" into experimental 2013-04-22 23:22:28 -07:00
Deb Mukherjee
735febf1ce Removing the implicit compound inter experiment
Removing this experiment for now, since it has been broken with
the latest code changes.

Change-Id: I1be2181b56de490fcb577f5905b5e147a8ed82d8
2013-04-22 16:46:54 -07:00
Jim Bankoski
366ff224ef Merge "new version of speed 1" into experimental 2013-04-22 16:42:33 -07:00
Jim Bankoski
e7bddba149 new version of speed 1
This version of speed 1 only disables modes at higher resolution that
had distortions >2x the best mode we found...

The hope is that this could be a replacement for speed 0 ...

Change-Id: I7421f1016b8958314469da84c4dccddf25390720
2013-04-22 15:42:41 -07:00
Dmitry Kovalev
5de7e16ca2 Adding get_scan_{4x4, 8x8, 16x16} functions.
Change-Id: Id4306ef6d65d4a3984aed50b775bdf48d4f6c438
2013-04-22 14:08:41 -07:00
John Koleszar
a443447b8b Move pre, second_pre to per-plane MACROBLOCKD data
Continue moving framebuffers to per-plane data.

Change-Id: I237e5a998b364c4ec20316e7249206c0bff8631a
2013-04-22 12:05:24 -07:00
Deb Mukherjee
f12509f640 Merge "Removes the code_nonzerocount experiment" into experimental 2013-04-22 11:53:14 -07:00
Deb Mukherjee
0aa79be7d5 Removes the code_nonzerocount experiment
This patch does not seem to give any benefits.

Change-Id: I9d2b4091d6af3dfc0875f24db86c01e2de57f8db
2013-04-22 10:58:49 -07:00
Deb Mukherjee
6ce718eb18 Merge "End of orientation zero group experiment" into experimental 2013-04-22 10:33:12 -07:00
Deb Mukherjee
70d9f116fd End of orientation zero group experiment
Adds an experiment that codes an end-of-orientation symbol
for every eligible zero encountered in scan order.

This cleans out various other sub-experiments that were part
of the origiinal patch, which will be later included if found
useful.

Results are slightly positive on all sets (0.1 - 0.2% range).

Change-Id: I57765c605fefc7fb9d1b57f1b356843602abefaf
2013-04-22 09:27:59 -07:00
John Koleszar
6d5ac8f2e1 reconinter: remove unnecessary functions, params
Removes the redundant dst pointers from vp9_build_inter_predictors_sb{y,uv}
and the remaining mb specific functions.

Change-Id: I7b6bf439d9394b85ea79b4fe61a3ffc1025720da
2013-04-22 08:20:54 -07:00
John Koleszar
fa8ddbd2a6 Merge "Move dst to per-plane MACROBLOCKD data" into experimental 2013-04-19 16:33:45 -07:00
John Koleszar
d12376aa2c Move dst to per-plane MACROBLOCKD data
First in a series of commits moving the framebuffers pointers to
per-plane data, so that they can be indexed numerically rather than
by name.

Change-Id: I6e0d60fd4d51e6375c384eb7321776564df21775
2013-04-19 16:16:10 -07:00
Yunqing Wang
25edb68100 Merge "Remove unused parameters in handle_inter_mode" into experimental 2013-04-19 14:12:43 -07:00
Paul Wilkins
fb754fd37e Merge "Mv ref candidates cut to 2." into experimental 2013-04-19 14:09:44 -07:00
Dmitry Kovalev
3689122b1c Merge "Fixing member names inside TOKENVALUE and TOKENEXTRA structs." into experimental 2013-04-19 10:09:04 -07:00
Jim Bankoski
35b1d2e38f Merge "catch all for new block sizes" into experimental 2013-04-19 09:57:38 -07:00
Jim Bankoski
afb04eb211 catch all for new block sizes
Just make sure we don't stop them from testing in speed 1.

Change-Id: Iec9b3dba0a32616ff7a451207e0f54b81bb72575
2013-04-19 09:48:56 -07:00
Jim Bankoski
6d82fe219d Merge "set up a new speed 1" into experimental 2013-04-19 08:28:35 -07:00
Paul Wilkins
de80da39dc Mv ref candidates cut to 2.
Further simplification of mvref search to return
only the top two candidates. Distance weights removed
as the test order reflects distance anyway.

Change-Id: I0518cab7280258fec2058670add4f853fab7b855
2013-04-19 16:13:53 +01:00
Jim Bankoski
b6ef0823c5 set up a new speed 1
slightly worse results for faster encodes

Change-Id: I25ea82a18ce20635dbcd328808c1d05ac1f58fd7
2013-04-19 08:04:57 -07:00
Paul Wilkins
92e8a3f514 Simplification of MVref search.
As we are no longer able to sort the candidate
mvrefs in both encoder and decode and given
that the cost of explicit signalling has proved
prohibitive, it no longer makes sense to find more
than 2 candidates.

This patch:

Modifies and simplifies add_candidate_mv()

Removes the forced addition of a 0 vector in the
MAX_MV_REF_CANDIDATES-1 position (in preparation
to reducing MAX_MV_REF_CANDIDATES to 2).

Re-orders the addition of candidates slightly.

This actually gives small gains (circa 0.2% on std-hd)

A subsequent patch will remove NEW_MVREF experiment,
reduce MAX_MV_REF_CANDIDATES to 2 and remove distance
weights as these are implicit now in the order.

Change-Id: I3dbe1a6f8a1a18b3c108257069c22a1141a207a4
2013-04-19 11:19:59 +01:00
Dmitry Kovalev
77f4697a13 Fixing member names inside TOKENVALUE and TOKENEXTRA structs.
Change-Id: I183ec5819d4d80966c92db36db75b8c3be0d381d
2013-04-18 16:18:08 -07:00
Jingning Han
f0b065e946 Merge "Make the use of pred buffers consistent in MB/SB" into experimental 2013-04-18 15:24:55 -07:00
Jingning Han
6f43ff5824 Make the use of pred buffers consistent in MB/SB
Use in-place buffers (dst of MACROBLOCKD) for  macroblock prediction.
This makes the macroblock buffer handling consistent with those of
superblock. Remove predictor buffer MACROBLOCKD.

Change-Id: Id1bcd898961097b1e6230c10f0130753a59fc6df
2013-04-18 14:59:36 -07:00
Dmitry Kovalev
a8d903e539 Merge "Replacing VP9_COMBINEENTROPYCONTEXTS macro with function." into experimental 2013-04-18 14:26:34 -07:00
Dmitry Kovalev
8b20aa2337 Merge "Renaming y1dc_delta_q, uvdc_delta_q, uvac_delta_q fields from VP9Common." into experimental 2013-04-18 14:26:06 -07:00
Yunqing Wang
e304160885 Remove unused parameters in handle_inter_mode
Removed 2 unused parameters.

Change-Id: Ic2862569313c404047072b268c3d2be3f635492c
2013-04-18 11:55:46 -07:00
Ronald S. Bultje
e693472236 Fairly basic integration of rectangular blocks in encoding RD loop.
Adds RD integration for 32x16, 16x32, 64x32 and 32x64 rectangular blocks.
Derf almost +0.6%, HD a little over +1.0%, STDHD +1.3%.

Change-Id: Id651fdb6a655fdbb5c47009757e63317acfb88a5
2013-04-17 09:25:06 -07:00
Dmitry Kovalev
9087d6d470 Replacing VP9_COMBINEENTROPYCONTEXTS macro with function.
Change-Id: I3bbc31840af69481e1d9bb4427c9ee25abf82946
2013-04-16 15:30:28 -07:00
Dmitry Kovalev
1ad7c1f250 Renaming y1dc_delta_q, uvdc_delta_q, uvac_delta_q fields from VP9Common.
New names are y_dc_delta_q, uv_dc_delta_q, uv_ac_delta_q.

Change-Id: I4acae1fc23a4697ce2c5a5becb8dc28ef0a4b552
2013-04-16 15:05:52 -07:00
John Koleszar
e3cfe4e89e Remove the mb_no_coeff_skip flag
This flag was added to VP8 to allow a mode where MB-level skipping
was not allowed, saving a bit per mb. It was never used in practice,
and hasn't been tested in VP9, so remove it.

Change-Id: Id450ec6904c6d06c1919508e7efc52d05cde5631
2013-04-16 12:36:16 -07:00
Dmitry Kovalev
a0d9309eab Removing TRUE and FALSE macro definitions.
Using regular 0 and 1 constants now.

Change-Id: Ie763503cbb727847cc8f1d6506cd6f2ee607f056
2013-04-15 15:24:39 -07:00