Commit Graph

1218 Commits

Author SHA1 Message Date
Scott LaVarnway
02162f1be5 Remove unnecessary vp9_copy_memNxN() calls
The build predictors now output directly to the dest.  These
copies are no longer required.

Change-Id: I8e91eda6b49993e6c16cfadc705d82a7e80f19b2
2013-04-11 20:03:48 -04:00
Jingning Han
815e95fbeb Make intra predictor support rectangular blocks
The intra predictor supports configurable block sizes. It can handle
intra prediction down to 4x4 sizes, when enabled in BLOCK_SIZE_TYPE.

Change-Id: I7399ec2512393aa98aadda9813ca0c83e19af854
2013-04-11 16:45:57 -07:00
Ronald S. Bultje
d415d28717 Merge loop over all macroblock modes into encode_sb_row().
Rename pick_mb_modes to pick_mb_mode, since it now handles only a
single macroblock. This is consistent with pick_sb_mode handling a
single non-macroblock.

Change-Id: I896fdfa06436b2d8c24d6474718cc74420df6b3b
2013-04-11 15:56:39 -07:00
John Koleszar
2f19cd03aa Merge "Remove unused vp9_recon_mb{y,uv}_s" into experimental 2013-04-11 15:51:20 -07:00
Ronald S. Bultje
deeef42b77 Merge "Remove subtract_mb* functions." into experimental 2013-04-11 15:50:40 -07:00
Dmitry Kovalev
4fdf8ccca2 Adding vp9_read_and_apply_sign function.
Change-Id: I9951a06dbe4514cc1cf69ff4349c4e12cb4a318c
2013-04-11 15:36:43 -07:00
Scott LaVarnway
cff266bbef Merge "WIP: removing predictor buffer usage from decoder" into experimental 2013-04-11 15:24:33 -07:00
Ronald S. Bultje
56d01ee0a6 Merge "Remove unused macroblock versions of reconstruction functions." into experimental 2013-04-11 15:19:08 -07:00
Ronald S. Bultje
44dc18064e Merge "Remove "tplist" from VP9_COMP." into experimental 2013-04-11 15:17:03 -07:00
Ronald S. Bultje
69902c6bf0 Merge "Merge pick_sb_modes and pick_sb64_modes." into experimental 2013-04-11 15:06:37 -07:00
Deb Mukherjee
7a97959f13 Merge "Turning model-based updates on with modelcoefprob" into experimental 2013-04-11 14:54:53 -07:00
Deb Mukherjee
66f413af4f Turning model-based updates on with modelcoefprob
This patch changes the default with the modecoefprob expt
to use mode-based forward updates with one-node pegged
modeling.

The maximum difference with fully trained tables is now
less that 0.1%.

Change-Id: I06b44322e10c6703f93f3c1d48d973b1136a0618
2013-04-11 14:45:26 -07:00
John Koleszar
4ba74ae81a Merge "Remove unused vp9 ppc files" into experimental 2013-04-11 14:39:18 -07:00
John Koleszar
c382ed09f8 Remove unused vp9_recon_mb{y,uv}_s
These functions now are handled through the common superblock code.

Change-Id: Ib6688971bae297896dcec42fae1d3c79af7a611c
2013-04-11 14:05:59 -07:00
Scott LaVarnway
6189f2bcb1 WIP: removing predictor buffer usage from decoder
This patch will use the dest buffer instead of the
predictor buffer.  This will allow us in future commits
to remove the extra mem copy that occurs in the dequant
functions when eob == 0.  We should also be able to remove
extra params that are passed into the dequant functions.

Change-Id: I7241bc1ab797a430418b1f3a95b5476db7455f6a
2013-04-11 13:55:18 -07:00
John Koleszar
8bf6de725c Merge changes I6721e42f,Iaffb1ae8 into experimental
* changes:
  tokenize: convert skippable functions
  Add foreach_transformed_block
2013-04-11 13:36:25 -07:00
John Koleszar
633d9e7b4f Remove unused vp9 ppc files
Change-Id: I3fe8c529ddec658cfa2376cfc05d9c8a5366e978
2013-04-11 13:29:37 -07:00
Dmitry Kovalev
24f18e1c34 Renaming vp9_token_struct to vp9_token and removing previous typedef.
Change-Id: If69c3d795f87af5cc7bfdfe70ef733c41b4d55c8
2013-04-11 13:01:52 -07:00
John Koleszar
c2bd46bf45 tokenize: convert skippable functions
Use the common block walker to calculate skippability.

Change-Id: I6721e42f065df237426c91c1d871ec226ba7cdcb
2013-04-11 12:27:37 -07:00
Ronald S. Bultje
340bc46f49 Remove subtract_mb* functions.
Use subtract_sb* instead.

Change-Id: I3f34140ab97061063a4452945347ef1fe37e13d1
2013-04-11 12:27:15 -07:00
Ronald S. Bultje
13e41ba440 Remove unused macroblock versions of reconstruction functions.
More specifically, remove vp9_quantize_mb*, vp9_optimize_mb*,
vp9_inverse_transform_mb* and vp9_transform_mb*. Instead, use the
generic _sb* functions that take a size argument, and call them with
BLOCK_SIZE_MB16X16.

Change-Id: I33024afea95d3a23ffbc1df7da426e4645110f29
2013-04-11 12:27:15 -07:00
Ronald S. Bultje
2e2b8a53cc Remove "tplist" from VP9_COMP.
It is write-only.

Change-Id: I2412344688d96593cc01c038e7f51410d0f85ed0
2013-04-11 12:27:14 -07:00
John Koleszar
42471f6b72 Add foreach_transformed_block
Adds a framework for doing arbitrary functions on each transform-
sized block in the mb/sb.

Change-Id: Iaffb1ae8db5ff2abfa8720c608c78376b42f2096
2013-04-11 11:42:19 -07:00
John Koleszar
c18b2617a4 Remove vp9_reset_mb_tokens_context
Use sb-common version instead.

Change-Id: If2552b5a39fd2e5272f66a41c5667dda85fd3939
2013-04-11 11:39:19 -07:00
Dmitry Kovalev
ec299e2092 Encoder code cleanup.
Removing duplicated code from vp9_encodemv.c and reusing ROUND_POWER_OF_TWO
macro definitions.

Change-Id: I9caf0c17f761ada7905cb99a3e2a31f871fef0f9
2013-04-11 11:08:00 -07:00
Ronald S. Bultje
605ff051f7 Merge pick_sb_modes and pick_sb64_modes.
Change-Id: Iad69e7a3b7e470acf6094f6a52e7da69066fd552
2013-04-11 09:33:49 -07:00
Ronald S. Bultje
38d7945345 Slight simplification of SB RD loop recursion conditions.
Change-Id: I87a406fcd18ab043253ca0c009d1182fdc5c3046
2013-04-11 09:14:55 -07:00
Ronald S. Bultje
4eb537c0e6 A few more cases where sb_type was used arithmetically.
With these fixed, the codec produces identical results regardless of
what literal values are used for the enum members in BLOCK_SIZE_*.

Change-Id: I26db8e08019b58ba432af1f0950ebe6b0eb4ad8c
2013-04-10 18:04:57 -07:00
Ronald S. Bultje
33d94a843f Remove copying of coefficients and predictor in i8x8 RD loop.
The resulting values are never used.

Change-Id: I688caf30da9aab87aa280cce913eda4f33172293
2013-04-10 17:39:03 -07:00
Ronald S. Bultje
8fb5be48a6 Make usage of sb_type independent of literal values.
Change-Id: I0d12f9ef9d960df0172a1377f8e5236eb6d90492
2013-04-10 17:38:57 -07:00
Ronald S. Bultje
b4f6098ef7 Make RD superblock mode search size-agnostic.
Merge various super_block_yrd and super_block_uvrd versions into one
common function that works for all sizes. Make transform size selection
size-agnostic also. This fixes a slight bug in the intra UV superblock
code where it used the wrong transform size for txsz > 8x8, and stores
the txsz selection for superblocks properly (instead of forgetting it).
Lastly, it removes the trellis search that was done for 16x16 intra
predictors, since trellis is relatively expensive and should thus only
be done after RD mode selection.

Gives basically identical results on derf (+0.009%).

Change-Id: If4485c6f0a0fe4038b3172f7a238477c35a6f8d3
2013-04-10 16:50:30 -07:00
Jingning Han
a4579e04c9 Merge "Make dequant/idct block size independent" into experimental 2013-04-10 16:47:53 -07:00
Jingning Han
bbd0063b5c Make dequant/idct block size independent
The unified dequantization, inverse transform, and adding functions
support rectangular block sizes. Also separate the operations on
luma and chroma components, in the consideration of the txfm_size
for uv components in rectangular block sizes.

Change-Id: I2a13246b2a9086b37d575d346070990d854cc110
2013-04-10 15:54:43 -07:00
Yaowu Xu
8e9819230d Merge "Remove obselete code" into experimental 2013-04-10 14:56:28 -07:00
Yaowu Xu
2da90fddc2 Remove obselete code
The strategy to run fast loop filter picking for encoder speed-up
should be revisited at a later stage.

Change-Id: I3b75e06d767cff41be952a42e63b3292f4eab996
2013-04-10 13:45:22 -07:00
Jingning Han
5b9dc7c68e Merge "Make SB Decoding units size-independent" into experimental 2013-04-10 13:43:51 -07:00
Dmitry Kovalev
0cef7234e1 Merge "Fixing upper case names." into experimental 2013-04-10 13:29:38 -07:00
Jingning Han
e63099d199 Make SB Decoding units size-independent
Unify the sb32x32 and sb64x64 decoding units, which also allow for
other rectangular block sizes.

Change-Id: Ia5187ab2af56f98c3f99272bdf4dbcabe798ad5d
2013-04-10 10:52:10 -07:00
Dmitry Kovalev
1c6df34c06 Merge "Code cleanup in bitstream code." into experimental 2013-04-10 10:18:50 -07:00
Dmitry Kovalev
2759ce85ad Merge "Adding setup_quantization function." into experimental 2013-04-10 10:16:30 -07:00
Dmitry Kovalev
b41e297582 Merge "Renaming inverse hybrid transform functions." into experimental 2013-04-10 10:16:00 -07:00
Dmitry Kovalev
20645ec4fb Merge "Cleanup of set_offsets function." into experimental 2013-04-10 10:15:13 -07:00
Ronald S. Bultje
1932828d19 Merge "Make SB coding size-independent." into experimental 2013-04-10 08:51:58 -07:00
Ronald S. Bultje
9b46e30494 Merge "Don't use BLOCKD in vp9_invtrans.c." into experimental 2013-04-09 21:36:09 -07:00
Ronald S. Bultje
a3874850dd Make SB coding size-independent.
Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code
gives identical encoder results before and after. There are a few
macros for rectangular block sizes under the sbsegment experiment; this
experiment is not yet functional and should not yet be used.

Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728
2013-04-09 21:28:27 -07:00
Dmitry Kovalev
f370db0cf4 Adding setup_quantization function.
Change-Id: I8fe25a905717a3cd2da5f87ba0403357536183cf
2013-04-09 18:24:08 -07:00
Yunqing Wang
d3c526fbda Merge "Fix an issue in set_refs()" into experimental 2013-04-09 14:31:51 -07:00
John Koleszar
a3ec4cbd33 Merge "detokenize: use consistent structure for all block sizes" into experimental 2013-04-09 14:18:59 -07:00
Yunqing Wang
01a3bd67d8 Fix an issue in set_refs()
Scale factor for second ref frame wasn't assigned in the code.

Change-Id: I6ef3f3f71bd652a879ad847369c54c744782ea37
2013-04-09 12:33:28 -07:00
Dmitry Kovalev
02349561b6 Renaming inverse hybrid transform functions.
Renaming vp9_ht_dequant_idct_add* functions to vp9_dequant_iht_add*.

Change-Id: Ie427b322b1cc7c8f39d1155f5df91dedfbd944af
2013-04-09 11:09:23 -07:00
Dmitry Kovalev
c34f6fcb54 Fixing upper case names.
Renaming Y1dequant to y_dequant, UVdequant to uv_dequant, QIndex to qindex.

Change-Id: I1c356e5f886deb3f8807dc212de9799b55b09d58
2013-04-09 10:46:57 -07:00
Dmitry Kovalev
df76a617b4 Cleanup of set_offsets function.
Adding ALLOWED_REFS_PER_FRAME constant instead of hard coded number 3.

Change-Id: I46146aa837896936f920c748c7d4aa4c27f026e4
2013-04-09 10:17:22 -07:00
Dmitry Kovalev
2a6e09d8fe Merge "Simplification of decoder's code." into experimental 2013-04-09 10:10:29 -07:00
Jingning Han
b3935e8348 Merge "Clamp inferred motion vectors only" into experimental 2013-04-09 09:24:08 -07:00
Dmitry Kovalev
d1cff2deb1 Code cleanup in bitstream code.
Lower case variable names, less code.

Change-Id: I1abc8f592ad2343ab5c76fe2d16262741a4a894a
2013-04-08 19:07:29 -07:00
John Koleszar
e6deea4e60 detokenize: use consistent structure for all block sizes
Restructure the code to avoid the majority of per-block-size
switches, code duplication, etc. All block types (mb/sb32/sb64)
can be handled by the same code.

Change-Id: I4022718d66e31a15a7074e43f3b98cd0a5124ea7
2013-04-08 13:11:40 -07:00
Dmitry Kovalev
5811d7e865 Simplification of decoder's code.
Removing several commented code blocks, using uint32_t and uint8_t types,
removing redundant code.

Change-Id: Ifc5cc9863897925ea2a7cab4f7309ccf28d80bfe
2013-04-08 12:14:40 -07:00
Ronald S. Bultje
f42bee7edf Don't use BLOCKD in vp9_invtrans.c.
Change-Id: I40524170334109e2864b06e3c73c8b34e5aa8b0f
2013-04-08 11:37:29 -07:00
Jingning Han
12bf0796e6 Clamp inferred motion vectors only
Clamp only the motion vectors inferred from neighboring reference
macroblocks. The motion vectors obtained through motion search in
NEWMV mode are constrained during the search process, which allows
a relatively larger referencing region than the inferred mvs.
Hence further clamping the best mv provided by the motion search may
affect the efficacy of NEWMV mode.

Synchronized the decoding process. The decoded mvs in NEWMV modes
should be guaranteed to fit in the effective range. Put a mv range
clamping function there for security purpose.

This improves the coding performance of high motion sequences, e.g.,
derf set:
foreman 0.233%
husky   0.175%
icd     0.135%
mother_daughter 0.337%
pamphlet        0.561%

stdhd set:
blue_sky 0.408%
city     0.455%
also saw sunflower goes down by -0.469%.

Change-Id: I3fcbba669e56dab779857a8126a91b926e899cb5
2013-04-08 11:37:03 -07:00
Ronald S. Bultje
aeefa6e194 Fix typo which breaks 4x4 splitmv compound prediction RD code.
0.15% quality increase on derf, particularly noticeable on hard clips
at the higher bitrate end.

Change-Id: I02415a96eb9bbc361cba923069625fae71844bc9
2013-04-08 09:17:52 -07:00
John Koleszar
0e7b7e47c2 Merge "Small cleanup inside setup_loopfilter function." into experimental 2013-04-05 16:13:46 -07:00
John Koleszar
8bbabbea70 Merge "Segmentation code cleanup." into experimental 2013-04-05 16:03:25 -07:00
John Koleszar
fa135d7b9e Merge changes Ibbfa68d6,Idb76a0e2 into experimental
* changes:
  Move EOB to per-plane data
  Move qcoeff, dqcoeff from BLOCKD to per-plane data
2013-04-05 15:56:50 -07:00
Ronald S. Bultje
9161127ee9 Merge "Remove full-pixel-related code." into experimental 2013-04-05 13:46:07 -07:00
Ronald S. Bultje
fd2a747038 Merge "Remove some unused macros." into experimental 2013-04-05 13:46:02 -07:00
Ronald S. Bultje
c6c07d7013 Merge "Remove struct POS." into experimental 2013-04-05 13:45:58 -07:00
Ronald S. Bultje
a9688dfdfb Merge "Remove unused vpx_log() function prototype." into experimental 2013-04-05 13:45:51 -07:00
Ronald S. Bultje
ac28c3169a Merge "Remove "tx_type" member from union b_mode_info." into experimental 2013-04-05 13:45:48 -07:00
Yaowu Xu
2e23c74794 Merge "Removed a speed feature no longer used" into experimental 2013-04-05 13:34:57 -07:00
Yaowu Xu
3dca0d44d2 Merge "make one_shot_q an experiment" into experimental 2013-04-05 13:34:45 -07:00
Ronald S. Bultje
36c3a67c20 Remove full-pixel-related code.
This is a VP8-only feature (part of profile 3) that is unsupported in
VP9.

Change-Id: I78016eede8d9c834d44d4c517f3e8b8fc2a378b1
2013-04-05 12:50:19 -07:00
Dmitry Kovalev
421baef49e Small cleanup inside setup_loopfilter function.
Change-Id: If7fa8aea02f26c2c2bb5daf4e65c3e661d7031ca
2013-04-05 12:48:48 -07:00
Ronald S. Bultje
61834f7325 Remove some unused macros.
Change-Id: Ic219e7878428128e4bb1b3995e8151f92b6bd9c3
2013-04-05 12:40:56 -07:00
Ronald S. Bultje
0732a61c37 Remove struct POS.
It is never used.

Change-Id: If7462357c0498ed05af2645f0c272124381d3aab
2013-04-05 12:38:40 -07:00
Ronald S. Bultje
1cb34c32ed Remove unused vpx_log() function prototype.
Change-Id: Icd6b4322841fefcc86f06645e6aaf1ea42fdfabd
2013-04-05 12:37:45 -07:00
Ronald S. Bultje
5cd235c6cd Remove "tx_type" member from union b_mode_info.
It is never used.

Change-Id: Ibae898c52c766aabf65868611060f9c38fb85b35
2013-04-05 12:36:15 -07:00
Dmitry Kovalev
2c42499513 Segmentation code cleanup.
Cleaning up the code, removing unused vp9_check_segref_inter function and
useless comments.

Change-Id: Ia0e1a3878dc0f9789cba84aeb507a83d9dccd26b
2013-04-05 11:55:52 -07:00
Yaowu Xu
e79a3ff5f3 Removed a speed feature no longer used
Change-Id: Id0c2e44daa936f1d6fb76469fd1bd72a4d7c19fd
2013-04-05 10:43:20 -07:00
John Koleszar
98466e8962 Merge "Simplifying get_delta_q function." into experimental 2013-04-05 09:16:15 -07:00
John Koleszar
05a79f2fbf Move EOB to per-plane data
Continue migrating data from BLOCKD/MACROBLOCKD to the per-plane
structures.

Change-Id: Ibbfa68d6da438d32dcbe8df68245ee28b0a2fa2c
2013-04-04 21:30:23 -07:00
John Koleszar
4c05a051ab Move qcoeff, dqcoeff from BLOCKD to per-plane data
Start grouping data per-plane, as part of refactoring to support
additional planes, and chroma planes with other-than 4:2:0
subsampling.

Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23
2013-04-04 16:30:57 -07:00
Yaowu Xu
9780d58e94 make one_shot_q an experiment
so it is configurable to faciliate testings

Change-Id: I247b62736c3a08ec2934793959d1ae605a05efa3
2013-04-04 14:14:51 -07:00
Deb Mukherjee
ffc92da4c2 Fixing the newbintramodes experiment
Adds back special casing B_PRED mode decoding but protected
within the experimental macro.

Change-Id: If98dc8e56b0ecfb1202540c2b7dfdd070cb81ca0
2013-04-04 12:40:55 -07:00
Dmitry Kovalev
52128c5894 Simplifying get_delta_q function.
Change-Id: I3a1e9cc5c3ed5be01ff75a84a6c82ec02c75af9c
2013-04-04 12:10:39 -07:00
Deb Mukherjee
a9e94301f7 Merge "Bugfix in encode_inter_mb_segment_8x8" into experimental 2013-04-04 11:17:48 -07:00
Deb Mukherjee
73031aaa7d Bugfix in encode_inter_mb_segment_8x8
Fixes an indexing bug. Looks like the bug has been there for a while.

Change-Id: I9fc04b0c30754bcb47366ad94a08112925600c4d
2013-04-04 11:07:19 -07:00
Dmitry Kovalev
f857e074d7 Fixing bug introduced by previous commit.
Inside decode_sb_4x4 it should be
"get_tx_type_4x4(mb, y_idx * y_size + x_idx)"
but it was
"get_tx_type_4x4(mb, y_idx * (2 * y_size) + x_idx)".
Also making code of decode_sb_4x4, decode_sb_8x8, and decode_sb_16x16
formatted in the same way.

Change-Id: I15c7bef4fb575f7e9da19f953912324cb35d24dd
2013-04-04 10:49:17 -07:00
John Koleszar
ccc0577ab2 Merge "Remove special case vp9_decode_coefs_4x4" into experimental 2013-04-04 07:11:31 -07:00
Paul Wilkins
9b9136f8a2 Fixed incorrect use of compute_qdelta()
This function expects real Q values as inputs
not index values.

The use-age her impacts the Q chosen for force key
frames. Though this is a bug fix I have not yet verified
whether following the bug fix the q multiplier value used is
correct.

Change-Id: I49f6da894d90baeb1e86c820c335f02dc80d3b66
2013-04-04 10:19:16 +01:00
John Koleszar
74e8bd11c2 Merge "Adding decode_sb_16x16 function." into experimental 2013-04-03 21:07:53 -07:00
John Koleszar
4d9dbb2ae8 Merge "Reimplementation of setup_frame_size." into experimental 2013-04-03 21:04:29 -07:00
John Koleszar
0520833591 Merge "Adding setup_pred_probs and read_txfm_mode functions." into experimental 2013-04-03 21:02:59 -07:00
John Koleszar
cbd3b98dd8 Merge "General code cleanup." into experimental 2013-04-03 20:59:51 -07:00
Dmitry Kovalev
d5a017300c General code cleanup.
Making code more readable in different places.

Change-Id: Iea92c9a35e64d257ee358879fc04fc926843d52e
2013-04-03 18:40:17 -07:00
Dmitry Kovalev
50e02b947a Adding decode_sb_16x16 function.
Moving command code from decode_sb32 and decode_sb64 into new
decode_sb_16x16 function.

Change-Id: I57a161300af085557adec2fe600f3c10a145faf2
2013-04-03 18:37:28 -07:00
John Koleszar
4add99aa97 Merge "Motion vector decoder cleanup." into experimental 2013-04-03 18:00:31 -07:00
Dmitry Kovalev
19fb4df8fe Motion vector decoder cleanup.
Better formatting, shorter code, adding read_switchable_filter_type
function.

Change-Id: Ib919b529385cae34c2d682b1c3093518b6942fc1
2013-04-03 17:43:45 -07:00
John Koleszar
1e5f25ecc8 Remove special case vp9_decode_coefs_4x4
This code was only called in the BPRED case, but had no real special
case associated with it. Made BPRED behave like all other modes. No
bitstream change.

Change-Id: I87ba11fe723928b6314d094979011228d5ba006f
2013-04-03 16:12:51 -07:00
Yunqing Wang
dcd3a5c055 Merge "Modify vp9_setup_interp_filters function" into experimental 2013-04-03 14:09:01 -07:00
Yunqing Wang
4ca882f32f Modify vp9_setup_interp_filters function
Took vp9_setup_scale_factors_for_frame() out from
vp9_setup_interp_filters(), so that it is only called once per
frame instead of per macroblock. Decoder tests showed a 1.5%
performance gain.

Change-Id: I770cb09eb2140ab85132f82aed388ac0bdd3a0aa
2013-04-03 13:49:55 -07:00
Dmitry Kovalev
da0232fd59 Reimplementation of setup_frame_size.
General code cleanup in loopfilter code. Modification of setup_frame_size,
so now VP9_COMMON is modified in one place after all width/height checks
passed.

Change-Id: Iedf32df43a912d7aae788ed276ac6c429973f6fe
2013-04-03 12:21:47 -07:00
Dmitry Kovalev
59b2928d40 Adding setup_pred_probs and read_txfm_mode functions.
Decomposition vp9_decode_frame function, moving code into read_txfm_mode
and setup_pred_probs functions.

Change-Id: I90970dea43cbcef4d6d61fdef267c2094ddee65d
2013-04-03 12:18:15 -07:00
John Koleszar
30d83c4159 Merge "Fix overlapping writes by copy_and_extend_plane" into experimental 2013-04-03 11:54:29 -07:00
John Koleszar
7d67aed16c Merge "Remove unused inplace idct_add functions" into experimental 2013-04-03 11:10:50 -07:00
John Koleszar
8b71b8a6de Merge "Renaming sb32_coded and sb64_coded fields." into experimental 2013-04-02 21:49:03 -07:00
John Koleszar
dc12e6c0dc Merge "Lower case names for struct members." into experimental 2013-04-02 21:27:32 -07:00
John Koleszar
f677b13fb4 Merge "Adding functions with common code for superblock decoding." into experimental 2013-04-02 20:18:13 -07:00
John Koleszar
ede03dfa48 Merge "Code cleanup in vp9_onyx_if.c." into experimental 2013-04-02 20:16:56 -07:00
Dmitry Kovalev
dca8ad178c Renaming sb32_coded and sb64_coded fields.
Renaming sb32_coded to prob_sb32_coded and sb64_coded to prob_sb64_coded.

Change-Id: I6de5cad00a57c3e066d53467f8c38cb6073dce11
2013-04-02 18:21:55 -07:00
John Koleszar
01247f67a7 Fix overlapping writes by copy_and_extend_plane
Broken by refactoring commit 180cd5faa5

Change-Id: I307f6e54d93219a31e7336f1633103ecb25e4832
2013-04-02 14:58:10 -07:00
John Koleszar
42db454c7f Merge branch 'master' into experimental
Conflicts:
	vp9/vp9_common.mk

Change-Id: I2cd5ab47dc31c4210cefc23a282102123d5e2221
2013-04-02 14:54:44 -07:00
Dmitry Kovalev
626635c271 Lower case names for struct members.
Lower case member names inside VP9D_CONFIG and VP9D_COMP structs.

Change-Id: I75af9ad2d929a35c357207a3fd9ebedddabf79c3
2013-04-02 13:34:20 -07:00
Johann
3db60c8c6c Demux vp9_loopfilter_x86.c
Allow more careful targeting of compiler flags.

Change-Id: I963ab4a6479dedb165419310dfca52a58a9877b8
2013-04-02 12:49:04 -07:00
John Koleszar
e7b3b692e1 Remove unused inplace idct_add functions
Change-Id: I1c29e041d6db4af4508356315cd65718acb1f668
2013-04-02 12:23:22 -07:00
Johann
6c147b9d93 vp9_sadmxn_x86 only contains SSE2 functions
Rename the file and clean up includes. In the future we would like to
pattern match the files which need additional compiler flags.

Change-Id: I2c76256467f392a78dd4ccc71e6e0a580e158e56
2013-04-02 11:20:55 -07:00
Dmitry Kovalev
9738e2dbd8 Adding functions with common code for superblock decoding.
Adding decode_sb_8x8 and decode_sb_4x4 with common code for superblock
decoding. Renaming decode_superblock32 to decode_sb32 and
decode_superblock64 to decode_sb64.

Change-Id: Id006d7e398b9bfa3acec4326e1e0c537ebfefdd3
2013-04-02 10:42:22 -07:00
Dmitry Kovalev
6f53eee531 Code cleanup in vp9_onyx_if.c.
Using clamp and MIN/MAX functions instead of plain C code. Lower case
variable names. Removing redundant parenthesis.

Change-Id: Ibf7cc5fbe4fbdb5029049a599af71534176e6f42
2013-04-02 10:24:56 -07:00
John Koleszar
49bc402a94 Merge "Code cleanup." into experimental 2013-04-01 21:12:56 -07:00
John Koleszar
a417a6e32c Merge "Removing redundant function arguments." into experimental 2013-04-01 21:09:48 -07:00
John Koleszar
01e4e0b11d Merge "Code cleanup in block reconstruction code." into experimental 2013-04-01 21:05:35 -07:00
Dmitry Kovalev
e71248addc Code cleanup in block reconstruction code.
Adding recon, recond_sby and recon_sbuv functions.

Change-Id: I6050db233e792e73a3699d18b056eaef9c901d6d
2013-04-01 18:26:58 -07:00
Dmitry Kovalev
50e54c112d Code cleanup.
Adding multiple16 function, removing redundant code, better formatting.

Change-Id: I50195b78ac8ab803e3d05c8fb05a7ca134fab386
2013-04-01 18:23:04 -07:00
Ronald S. Bultje
cdac4ad4e6 Merge "Calculate SSIM over both reconstruction as well as postproc buffer." into experimental 2013-04-01 17:22:29 -07:00
Ronald S. Bultje
6dd6ffb0bb Calculate SSIM over both reconstruction as well as postproc buffer.
We used to calculate SSIM only over the postproc buffer, whereas we
calculate PSNR for both. Compared to postproc-SSIM, this is about 0.3%
higher for derf, 1.4% lower for hd and 0.5% lower for stdhd, although
it is highly variable on a per-clip basis.

Change-Id: I8dd491f0f5b4201dedfb15d288c854d5d4caa10f
2013-04-01 09:10:27 -07:00
Deb Mukherjee
e3955007df Merge "Framework changes in nzc to allow more flexibility" into experimental 2013-03-29 15:57:27 -07:00
John Koleszar
868ecb55a1 Merge "Tokenization code cleanup." into experimental 2013-03-29 10:55:55 -07:00
John Koleszar
edb1222acb Merge "Extracting common motion vector prediction code." into experimental 2013-03-29 10:43:38 -07:00
John Koleszar
2e181c2d0b Merge "General code cleanup." into experimental 2013-03-29 10:40:34 -07:00
John Koleszar
282a89f329 Merge "Extracting decode_tiles function." into experimental 2013-03-29 10:25:34 -07:00
Yaowu Xu
4b3e59ef0e Merge "define a specific neighborhood for SB64 mv search" into experimental 2013-03-29 09:26:14 -07:00
Yaowu Xu
cbc7ec55a5 Merge "remove code not in use" into experimental 2013-03-29 08:40:29 -07:00
Deb Mukherjee
c5840a8d8e Merge "Reoptimizing the interpolation filters" into experimental 2013-03-29 07:15:05 -07:00
Paul Wilkins
0b4deea896 Merge "Adjust mv_ratio_accumulator threshold." into experimental 2013-03-28 12:53:23 -07:00
Ronald S. Bultje
6cb2fcf601 Merge "Fix mix-up in pt token indexing." into experimental 2013-03-28 12:53:00 -07:00
Yaowu Xu
e071fe15b2 Merge "Fix crash when --tune=ssim is selected." into experimental 2013-03-28 11:23:44 -07:00
Ronald S. Bultje
ed78d1439f Merge "Save nzcstats." into experimental 2013-03-28 09:36:58 -07:00
Deb Mukherjee
fe9b5143ba Framework changes in nzc to allow more flexibility
The patch adds the flexibility to use standard EOB based coding
on smaller block sizes and nzc based coding on larger blocksizes.
The tx-sizes that use nzc based coding and those that use EOB based
coding are controlled by a function get_nzc_used().
By default, this function uses nzc based coding for 16x16 and 32x32
transform blocks, which seem to bridge the performance gap
substantially.

All sets are now lower by 0.5% to 0.7%, as opposed to ~1.8% before.

Change-Id: I06abed3df57b52d241ea1f51b0d571c71e38fd0b
2013-03-28 09:33:50 -07:00
Ronald S. Bultje
9eea9fa206 Fix mix-up in pt token indexing.
This fixes uninitialized reads in the trellis, and probably makes the
trellis do something again.

Change-Id: Ifac8dae9aa77574bde0954a71d4571c5c556df3c
2013-03-28 09:24:29 -07:00
Paul Wilkins
17ef6a8dfd Adjust mv_ratio_accumulator threshold.
This threshold effectively limits the amount of motion
from one end of a GF/ARF group to the other.
This patch makes the threshold depend on image size.

Change-Id: Id45d1d7bced815f86ddd037be53164894b00b82f
2013-03-28 12:49:02 +00:00
Paul Wilkins
befb0393c5 Fix crash when --tune=ssim is selected.
Crash fix only. No functional change or testing.

Change-Id: I0c6d114d024c29fc11ae61666f5938f11b01dd6a
2013-03-28 12:48:30 +00:00
Yaowu Xu
48104f0dfa define a specific neighborhood for SB64 mv search
Change-Id: Ifda91d697c5970c65ce3ec1feac5562124f91782
2013-03-27 16:34:45 -07:00
Dmitry Kovalev
72f9f10cf5 Extracting decode_tiles function.
Extracting decode_tiles function from vp9_decode_frame.

Change-Id: I02a465eeaf76138ef3559e1d46deb452c10e1219
2013-03-27 16:23:12 -07:00
Dmitry Kovalev
17cddb4e26 Removing redundant function arguments.
Almost all arguments for vp9_build_inter32x32_predictors_sb and
vp9_build_inter64x64_predictors_sb can be deduced from the first macroblock
argument.

Change-Id: I5d477a607586d05698d5b3b9b9bc03891dd3fe83
2013-03-27 16:19:27 -07:00
Dmitry Kovalev
52ccff4719 Extracting common motion vector prediction code.
Adding b_mv_pred_row and b_mv_pred_col functions, updating
mi_mv_pred_row and mi_mv_pred_row functions.

Change-Id: I9af068442d4474478375943cc6fce1605d6fc0a5
2013-03-27 14:35:36 -07:00
Dmitry Kovalev
180cd5faa5 General code cleanup.
Removing redundant code, lower case variable names, better indentation,
better parameter names, adding const to readonly parameters.

Change-Id: Ibfdee00f60316fdc5b3f024028c7aaa76a627483
2013-03-27 14:22:30 -07:00
John Koleszar
9ba8aed179 Merge "Extract setup_frame_size and update_frame_context functions." into experimental 2013-03-27 14:21:57 -07:00
Dmitry Kovalev
8c69c193b5 Extract setup_frame_size and update_frame_context functions.
Extracting setup_frame_size and update_frame_context functions. Introducing
vp9_read_prob function as shortcut for (vp9_prob)vp9_read_literal(r, 8).

Change-Id: Ia5c68fd725b2d1b9c5eb20f69cacb62361b5a3dd
2013-03-27 14:04:35 -07:00
Dmitry Kovalev
063628c885 Tokenization code cleanup.
Moving almost identical code to decode_sb32 and decode_sb64 functions.

Change-Id: Id39377aa5106be85d5b0fc3f83586b3779a6c0da
2013-03-27 14:03:56 -07:00
John Koleszar
648f93d59d Merge "Convert inv_tile_order to control interface" into experimental 2013-03-27 13:41:12 -07:00
John Koleszar
7060476ae4 Merge "Convert g_frame_parallel_decoding to control interface" into experimental 2013-03-27 13:41:09 -07:00
Yunqing Wang
d70e6a3679 Merge "Modify idct code to use macro" into experimental 2013-03-27 12:51:41 -07:00
Yunqing Wang
c6c0657c60 Modify idct code to use macro
Small modification of idct code.

Change-Id: I5c4e3223944c68e4ccf762f6cf07c990250e4290
2013-03-27 12:36:08 -07:00
John Koleszar
28d9202ed4 Merge "Cleaning up rate control code." into experimental 2013-03-27 12:29:00 -07:00
Yunqing Wang
0e91bec4b5 Merge "Optimize 32x32 idct function" into experimental 2013-03-27 11:30:48 -07:00
John Koleszar
672b75a103 Convert inv_tile_order to control interface
Restore ABI compatibility with the master branch.

Change-Id: Ie9f6fdf536662bd87dfcf114d16f003422670763
2013-03-27 11:22:20 -07:00
John Koleszar
81708cc326 Convert g_frame_parallel_decoding to control interface
Restore ABI compatibility with the master branch.

Change-Id: Ic57e7e1de09ab33bd37990e52a63ba7c8f1432a4
2013-03-27 11:07:26 -07:00
Yunqing Wang
21a718d9a7 Optimize 32x32 idct function
Wrote sse2 version of vp9_short_idct_32x32 function. Compared
to c version, the sse2 version is 5X faster.

Change-Id: I071ab7378358346ab4d9c6e2980f713c3c209864
2013-03-27 11:05:42 -07:00
Ronald S. Bultje
35dc9f5546 Save nzcstats.
Change-Id: I4a3a9eb9f9d17218a0f0d7e148123d34dae879c2
2013-03-27 09:44:47 -07:00
Ronald S. Bultje
513157e093 Scatter-based scantables.
This gains about 0.2% on derf, 0.1% on hd and 0.4% on stdhd. I can put
this under an experimental flag if wanted, just trying to get my patch
queue in shape.

Change-Id: Ibe1a30fe0e0b07bec4802e0f3ff0ba22e505f576
2013-03-27 09:44:45 -07:00
Ronald S. Bultje
7c70145914 Merge "Add col/row-based coefficient scanning patterns for 1D 8x8/16x16 ADSTs." into experimental 2013-03-26 19:17:08 -07:00
Ronald S. Bultje
3c77ab4c0f Merge "Redo banding for all transforms." into experimental 2013-03-26 19:16:44 -07:00
Ronald S. Bultje
c6efbbcfe4 Merge "Use above/left (instead of previous in scan-order) as token context." into experimental 2013-03-26 19:16:24 -07:00
Deb Mukherjee
23144d2345 Implicit weighted prediction experiment
Adds an experiment to use a weighted prediction of two INTER
predictors, where the weight is one of (1/4, 3/4), (3/8, 5/8),
(1/2, 1/2), (5/8, 3/8) or (3/4, 1/4), and is chosen implicitly
based on consistency of the predictors to the already
reconstructed pixels to the top and left of the current macroblock
or superblock.

Currently the weighting is not applied to SPLITMV modes, which
default to the usual (1/2, 1/2) weighting. However the code is in
place controlled by a macro. The same weighting is used for Y and
UV components, where the weight is derived from analyzing the Y
component only.

Results (over compound inter-intra experiment)
derf: +0.18%
yt: +0.34%
hd: +0.49%
stdhd: +0.23%

The experiment suggests bigger benefit for explicitly signaled weights.

Change-Id: I5438539ff4485c5752874cd1eb078ff14bf5235a
2013-03-26 16:58:56 -07:00
Ronald S. Bultje
d9094d8fd3 Add col/row-based coefficient scanning patterns for 1D 8x8/16x16 ADSTs.
These are mostly just for experimental purposes. I saw small gains (in
the 0.1% range) when playing with this on derf.

Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
2013-03-26 16:46:13 -07:00
Ronald S. Bultje
3120dbddb1 Redo banding for all transforms.
Now that the first AC coefficient in both directions use the same DC
as their context, there no longer is a purpose in letting both have
their own band. Merging these two bands allows us to split bands for
some of the very high-frequency AC bands.

In addition, I'm redoing the banding for the 1D-ADST col/row scans. I
don't think the old banding made any sense at all (it merged the last
coefficient of the first row/col in the same band as the first two of
the second row/col), which was clearly an oversight from the band being
applied in scan-order (rather than in their actual position). Now,
coefficients at the same position will be in the same band, regardless
what scan order is used. I think this makes most sense for the purpose
of banding, which is basically "predict energy for this coefficient
depending on the energy of context coefficients" (i.e. pt).

After full re-training, together with previous patch, derf gains about
1.2-1.3%, and hd/stdhd gain about 0.9-1.0%.

Change-Id: I7a0cc12ba724e88b278034113cb4adaaebf87e0c
2013-03-26 16:46:13 -07:00
Ronald S. Bultje
790fb13215 Use above/left (instead of previous in scan-order) as token context.
Pearson correlation for above or left is significantly higher than for
previous-in-scan-order (absolute values depend on position in scan, but
in general, we gain about 0.1-0.2 by using either above or left; using
both basically just makes this even better). For eob branch skipping,
we continue to use the previous token in scan order.

This helps about 0.9% on derf after re-training on a limited data set.
Full re-training and results on larger-resolution clips are pending.

Note that this commit breaks trellis, so we can probably get further
gains out of it by fixing trellis at some later point.

Change-Id: Iead68e296fc3a105cca746b5e3da9555d6010cfe
2013-03-26 16:46:09 -07:00
Deb Mukherjee
57c97e2a5b Reoptimizing the interpolation filters
Reoptimizes the 8-tap smooth filter.

Results:
derf: +0.101%
yt: +0.157%
hd: +0.791%
stdhd: +0.264%

The next step will be to reoptimize the other two filters.

Change-Id: I3d256a510ad9c7c30c33fae4a70fb43dfc708ed0
2013-03-26 16:34:35 -07:00
Yaowu Xu
43df87e841 remove code not in use
Change-Id: I4fa46f10e82aca36c563f7ea829e5a3177a0c740
2013-03-26 15:27:35 -07:00
John Koleszar
646616602d Merge "Cleaning up loopfilter code." into experimental 2013-03-26 12:40:37 -07:00
Dmitry Kovalev
77c664ade3 Cleaning up rate control code.
Lower case variable names, declaration and initialization on the same line,
removing redundant casts to double.

Change-Id: I7ea3905bed827aa6faac11a78401b85e448b57f9
2013-03-26 11:25:58 -07:00
Dmitry Kovalev
d7209b3a0a Cleaning up loopfilter code.
Lower case variable names, removing redundant variables, declaration and
initialization on the same line.

Change-Id: Ie0c6c95b14103990eb6a9d7784f8259c662e1251
2013-03-26 11:09:58 -07:00
Dmitry Kovalev
4a3d786019 Decomposition of vp9_decode_frame function.
Moving code from vp9_decode_frame function into setup_loopfilter and
setup_segmentation functions. A little bit of cleanup.

Change-Id: I2cce1813e4d7aeec701ccf752bf57e3bdd41b51c
2013-03-26 11:04:25 -07:00
John Koleszar
8e1c368486 Merge "Add an in-loop deringing experiment" into experimental 2013-03-26 08:36:55 -07:00
John Koleszar
7d9a7fb297 Merge "Code cleanup." into experimental 2013-03-26 08:34:06 -07:00
John Koleszar
f0923f3b01 Merge "Code cleanup." into experimental 2013-03-26 08:30:46 -07:00
John Koleszar
49c5841b2b Merge "Changing initialization order of mb_to_top_edge & mb_to_bottom_edge" into experimental 2013-03-26 08:25:45 -07:00
John Koleszar
441e2eab1b Add an in-loop deringing experiment
Adds a per-frame, strength adjustable, in loop deringing filter. Uses
the existing vp9_post_proc_down_and_across 5 tap thresholded blur
code, with a brute force search for the threshold.

Results almost strictly positive on the YT HD set, either having no
effect or helping PSNR in the range of 1-3% (overall average 0.8%).
Results more mixed for the CIF set, (-0.5 min, 1.4 max, 0.1 avg).
This has an almost strictly negative impact to SSIM, so examining a
different filter or a more balanced search heuristic is in order.

Other test set results pending.

Change-Id: I5ca6ee8fe292dfa3f2eab7f65332423fa1710b58
2013-03-26 08:23:24 -07:00
Deb Mukherjee
d14c7265f1 Bugfix in model coef prob experiment
Fixes an issue with model based update that got into
the original patch that was merged.

Change-Id: Ie42d3d0aff2e48cd187d96664dbd3e9d6d3ac22f
2013-03-26 07:30:42 -07:00
Deb Mukherjee
49dcc71493 Merge "Modeling default coef probs with distribution" into experimental 2013-03-26 07:13:13 -07:00
Deb Mukherjee
fd18d5dffe Modeling default coef probs with distribution
Replaces the default tables for single coefficient magnitudes with
those obtained from an appropriate distribution. The EOB node
is left unchanged. The model is represeted as a 256-size codebook
where the index corresponds to the probability of the Zero or the
One node. Two variations are implemented corresponding to whether
the Zero node or the One-node is used as the peg. The main advantage
is that the default prob tables will become considerably smaller and
manageable. Besides there is substantially less risk of over-fitting
for a training set.

Various distributions are tried and the one that gives the best
results is the family of Generalized Gaussian distributions with
shape parameter 0.75. The results are within about 0.2% of fully
trained tables for the Zero peg variant, and within 0.1% of the
One peg variant.

The forward updates are optionally (controlled by a macro)
model-based, i.e. restricted to only convey probabilities from the
codebook. Backward updates can also be optionally (controlled by
another macro) model-based, but is turned off by default. Currently
model-based forward updates work about the same as unconstrained
updates, but there is a drop in performance with backward-updates
being model based.

The model based approach also allows the probabilities for the key
frames to be adjusted from the defaults based on the base_qindex of
the frame. Currently the adjustment function is a placeholder that
adjusts the prob of EOB and Zero node from the nominal one at higher
quality (lower qindex) or lower quality (higher qindex) ends of the
range. The rest of the probabilities are then derived based on the
model from the adjusted prob of zero.

Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9
2013-03-25 23:43:38 -07:00
Dmitry Kovalev
3644a5b632 Code cleanup.
Fixing function arguments alignment, reusing MIN/MAX and clamp functions.

Change-Id: I87dd5a40ffb65b521b8abbf0fccf2f50552c5309
2013-03-25 15:16:14 -07:00
Dmitry Kovalev
7cc14e598e Code cleanup.
Lower case variable names, code simplification by using already defined
clamp and read_le16 functions.

Change-Id: I8fd544365bd8d1daed86d7b2ae0843e4ef80df08
2013-03-25 14:24:26 -07:00
Yunqing Wang
f68350ca98 Merge "Optimize 16x16 idct10 function" into experimental 2013-03-22 11:17:32 -07:00
Paul Wilkins
99a4939ec3 Merge "Disable zero bin mode boost." into experimental 2013-03-22 10:59:43 -07:00
Paul Wilkins
bfe7666142 Merge "Minor code clean up" into experimental 2013-03-22 10:53:12 -07:00
Paul Wilkins
ec080fa9de Disable zero bin mode boost.
As things stand the zero bin mode boost is hurting somewhat.
In part this seems to be because the boost applied as is
interferes with the rd mode selection loop.

Average gains (derf 0.072, yt 0.243, ythd 0.179 std-hd 0.212%)

Change-Id: Icaecea3908d9a7352370e49b8fa822f2c2c49dc1
2013-03-22 17:43:43 +00:00
Paul Wilkins
815734e5fb Minor code clean up
Change-Id: Ifa864e0acb253b238b03cdeed0fe5d6ee30a45d8
2013-03-22 17:42:45 +00:00
Paul Wilkins
52abaeca85 Merge "Remove TX size segment feature" into experimental 2013-03-22 10:39:22 -07:00
Yunqing Wang
869d6c0534 Optimize 16x16 idct10 function
Wrote sse2 version of vp9_short_idct10_16x16 function. Compared
to c version, the sse2 version is 2.3X faster.

Change-Id: I314c4f09369648721798321eeed6f58e38857f26
2013-03-21 16:36:01 -07:00
Dmitry Kovalev
407940243f Changing initialization order of mb_to_top_edge & mb_to_bottom_edge
Making consistent initialization of mb_to_{top,botton,left,right}_edge
variables after set_mb_row & set_mb_col calls. A little bit of code cleanup
additionally.

Change-Id: I245bfe32c5701e9836956dc25cf8c770d109cbc1
2013-03-21 12:51:57 -07:00
Yunqing Wang
8a3233b54d Merge "Optimize 16x16 idct function" into experimental 2013-03-21 11:54:20 -07:00
Yunqing Wang
ec3100661c Optimize 16x16 idct function
Wrote sse2 version of vp9_short_idct16x16 function. Compared to c
version, the sse2 version is over 2.5X faster.

Change-Id: I38536e2b846427a2cc5c5423aaf305fd0e605d61
2013-03-21 11:44:05 -07:00
Dmitry Kovalev
56f3a2c663 Code cleanup: lower case variable names.
Renaming Width to width, Height to height and Version to version in
several structs and function signatures.

Change-Id: I084c3f7e747cb2ce3345aff27a3dff9b13a87543
2013-03-20 16:41:30 -07:00
Dmitry Kovalev
66eff0aa38 Merge "Motion vector code cleanup." into experimental 2013-03-19 11:17:22 -07:00
Paul Wilkins
1c75e77b6d Remove TX size segment feature
Change-Id: I0d226e4cb240caced37230f46905bf69b46e0cce
2013-03-19 17:31:08 +00:00
Paul Wilkins
d8ffee4526 Changes to rd error_per_bit calculation.
Specifically changes to retain more precision
especially at low Q through to the point of use.

Change-Id: Ief5f010f2ca4daaabef49520e7edb46c35daf397
2013-03-18 23:07:51 +00:00
Ronald S. Bultje
a5b54d73e4 Merge "Fix ENTROPY_STATS code in vp9_tokenize.c." into experimental 2013-03-18 15:58:33 -07:00
Ronald S. Bultje
b99dce6881 Fix ENTROPY_STATS code in vp9_tokenize.c.
Change-Id: I9b4cb1e2ce6c6a99cffd473ff2fa7579bd318fcd
2013-03-18 15:39:04 -07:00
Yunqing Wang
6344c84c82 Optimize 8x8 idct function
Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8.
Compared to c version, the sse2 version is 2X faster. The decoder
test didn't show noticeable gain since 8x8 idct doesn't take much
of decoding time (less than 1% in my test).

Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3
2013-03-18 15:34:14 -07:00
John Koleszar
93529bd7c1 Merge "Replace scaling byte with explicit display size" into experimental 2013-03-18 13:02:07 -07:00
John Koleszar
8a3f55f2d4 Replace scaling byte with explicit display size
If the intended display size is different than the size the frame is
coded at, then send that size explicitly in the bitstream. Adds a new
bit to the frame header to indicate whether the extra size fields
are present.

Change-Id: I525c66f22d207efaf1e5f903c6a2a91b80245854
2013-03-18 12:02:20 -07:00
Paul Wilkins
ef179bce61 Merge "Adapt ARNR filter length and strength." into experimental 2013-03-18 12:00:39 -07:00
John Koleszar
c5b317057b Merge "Fix pulsing issue with scaling" into experimental 2013-03-18 11:57:36 -07:00
John Koleszar
e5d7542447 Merge "Add VP9_GET_REFERENCE control" into experimental 2013-03-18 11:57:31 -07:00
Paul Wilkins
cdb322dd72 Adapt ARNR filter length and strength.
Adjust the filter length and strength for each
ARF group based on a measure of difficulty (the boost)
and the active q range.

Remove lower limit on RDMULT value.

Average gains on the different sets in range 0.4%-0.9%.
However the ARNR changes give a very big boost on a
few clips.

Eg. Soccer ~5%, in derf set and Cyclist ~ 10% in the std-hd set

Change-Id: I2078d78798e27ad2bcc2b32d703ea37b67412ec4
2013-03-18 16:17:04 +00:00
Yaowu Xu
d29f5435df Merge "put refmvselection under experiment" into experimental 2013-03-18 08:51:33 -07:00
Yaowu Xu
12ade55719 Merge "removed reference to "LLM" and "x8"" into experimental 2013-03-18 08:51:19 -07:00
John Koleszar
9a56ea7e46 Merge "Remove some unused rate control variables" into experimental 2013-03-18 08:36:23 -07:00
John Koleszar
571fce6546 Merge "Fix use of NaN in firstpass" into experimental 2013-03-18 08:36:18 -07:00
Deb Mukherjee
bf7387f6b7 Merge "Context-pred fix to not use top/left on edges" into experimental 2013-03-16 19:09:25 -07:00
Deb Mukherjee
b1921b2f08 Context-pred fix to not use top/left on edges
This fix resolves some of the mismatch issues being seen
recently. While this is the right thing to do when tiling
is used for this experiment, it is not the underlying cause
of the the mismatches.
Something else is causing writing outside of the allowable
frame area in the encoder leading to this mismatch.

Change-Id: If52c6f67555aa18ab8762865384e323b47237277
2013-03-16 09:26:52 -07:00
John Koleszar
b8ac9f2f2c Remove some unused rate control variables
These variables are unused, and are subject to overflowing, causing
assertions when built with -ftrapv.

Change-Id: Ia00a3201af309906c05bcd4b23a643925ed6ea86
2013-03-15 17:53:45 -07:00
John Koleszar
db5f2cb57b Fix use of NaN in firstpass
If the second reference is better than the first in the long term,
it was possible to try to take the fractional exponent of a
negative number, giving an undefined result.

Change-Id: I1dd08286747ceae960eb03bb5d98a383cc9d253b
2013-03-15 17:53:38 -07:00
John Koleszar
117514b30f Merge "Cleaning up frame decoding functionality." into experimental 2013-03-15 17:44:32 -07:00
Christian Duvivier
4418b790a7 Faster vp9_short_fdct16x16.
Scalar path is about 1.5x faster (3.1% overall encoder speedup).
SSE2 path is about 7.2x faster (7.8% overall encoder speedup).

Change-Id: I06da5ad0cdae2488431eabf002b0d898d66d8289
2013-03-15 15:55:31 -07:00
Dmitry Kovalev
4a0686e716 Motion vector code cleanup.
Moving identical code to separate functions, variable declaration and
initialization on the same line.

Change-Id: Ifa6474a64189f9d8051e88e19850453b0227752c
2013-03-15 13:16:58 -07:00
Yaowu Xu
82fe8c9f36 Merge "force lossless coding at very high quality end" into experimental 2013-03-14 19:05:17 -07:00
Yaowu Xu
5d9ba7938e Merge "Remove leftover reference to 2nd order dc/ac quant" into experimental 2013-03-14 19:05:11 -07:00
Dmitry Kovalev
9285703e86 Cleaning up frame decoding functionality.
Change-Id: I25424904fb8541fc19d00d9fbc592379374b98c0
2013-03-14 12:31:54 -07:00
Yaowu Xu
374a17366e force lossless coding at very high quality end
Change-Id: I75fc4eee10bee9efd419d248827290cce8e6d637
2013-03-14 12:31:27 -07:00
Yaowu Xu
f4d2ad6915 Remove leftover reference to 2nd order dc/ac quant
Change-Id: Ib8dacf1d2797743569771b8f699e40e1aeb085cb
2013-03-14 10:46:15 -07:00
John Koleszar
9b7be88883 Fix pulsing issue with scaling
Updates the YV12_BUFFER_CONFIG structure to be crop-aware. The
exiting width/height parameters are left unchanged, storing the
width and height algined to a 16 byte boundary. The cropped
dimensions are added as new fields.

This fixes a nasty visual pulse when switching between scaled and
unscaled frame dimensions due to a mismatch between the scaling
ratio and the 16-byte aligned sizes.

Change-Id: Id4a3f6aea6b9b9ae38bdfa1b87b7eb2cfcdd57b6
2013-03-13 19:10:10 -07:00
John Koleszar
b3c350a1a9 Add VP9_GET_REFERENCE control
This is like VP8_COPY_REFERENCE, but returns a pointer to the reference
frame rather than a copy of it. This is useful when the application
doesn't know what the size of the reference is, as is the case when
scaling is in effect.

Change-Id: I63667109f65510364d0e397ebe56217140772085
2013-03-13 19:08:06 -07:00
Jingning Han
76c12ab9c9 Support +/-2048 motion vector coding
Enable entropy coding of motion vectors up to +/-2048. Also
extend the motion search range accordingly.

Change-Id: Iac2bb015e8934521cef83a19edbe967d9f097436
2013-03-13 14:08:27 -07:00
Yaowu Xu
88862c0454 put refmvselection under experiment
and turn the experiment off by default.

Change-Id: If9e684aa6cc49eacd39f36645a110a447e38d2de
2013-03-13 10:40:31 -07:00
Yaowu Xu
005552639b removed reference to "LLM" and "x8"
The commit changed the name of files and function to remove obselete
reference to LLM and x8.

Change-Id: I973b20fc1a55149ed68b5408b3874768e6f88516
2013-03-13 08:35:46 -07:00
John Koleszar
bd9cd9a185 fix superframe index marker masks
The superframe index marker byte carries data in the lower 5 bits. Only the
upper 3 should be used as part of the mask to detect it. By masking with
0xf0, the previous code was incorrect for frames over 65k bytes.

Change-Id: I6248889f5af227457f359a56b2348ef6db87a3b4
2013-03-12 19:04:32 -07:00
John Koleszar
c11313e31e fix superframe index with lagged encoding
If a superframe (ARF) is generated while flushing the lagged frames
at the end of the clip, the buffer pointer wasn't being properly
updated to account for the size of the index, causing the next
frame to overwrite the index on the previous frame.

Change-Id: Ib158cc8e4183d663bdfb9ba002dd4c98916abdc9
2013-03-12 16:33:38 -07:00
Paul Wilkins
a2c6f6e945 Merge "disambiguate superframe index in vp9_stop_encode()" into experimental 2013-03-12 16:00:30 -07:00
John Koleszar
872fc3ded8 disambiguate superframe index in vp9_stop_encode()
If the bool-coded partition naturally ends in a byte that matches the
superframe index marker, it could lead to a parse error. This commit
ensures that if such a marker is seen, it is padded out with an
additional zero byte to disambiguate it.

Change-Id: Id977de05745b6fa9ef08afb71e210a2a3ecca02e
2013-03-12 14:30:23 -07:00
Paul Wilkins
21ba242bfd Merge "Change buffer update rules on ARF overlay." into experimental 2013-03-12 11:07:20 -07:00
Ronald S. Bultje
8fc3ab7c62 Merge "Fix typo in comment for number of extra bits for cat6 tokens." into experimental 2013-03-12 10:45:12 -07:00
John Koleszar
5c1e57c3ce Merge "fix an assumption about uv_stride" into experimental 2013-03-12 10:44:31 -07:00
Ronald S. Bultje
516f7ac04e Fix typo in comment for number of extra bits for cat6 tokens.
Change-Id: I07ddf3be8bc5d6c2eb561d4241879777c315b183
2013-03-12 10:25:43 -07:00
Paul Wilkins
49d1425d19 Merge "Changes to maximum gf/arf interval." into experimental 2013-03-12 09:59:43 -07:00
Paul Wilkins
8be3056c45 Change buffer update rules on ARF overlay.
When coding the frame that corresponds to the midpoint frame
defining an ARF, do not update the last reference frame buffer.
Previously this buffer was updated meaning that when coding the next
ARF all the reference buffers were the same (or nearly so).
Turning the update off means that the frame before is still available
as an alternative predictor and for use in compound prediction.

Also fixed inconsistency in test for mismatch (patch from JK).

Net average gains (derf 0.049, yt 0.163, yt-hd 0.207, std-hd 0.286)

Change-Id: Ifee21da21ccbb1648ac2eafe890d3ce60562c7bc
2013-03-12 16:57:39 +00:00
John Koleszar
045c53f51e fix an assumption about uv_stride
Use the uv_stride from the framebuffer rather than deriving it from the
y_stride.

Change-Id: I94581cb741539d094ff062b3d008235556903b8c
2013-03-12 09:22:44 -07:00
Dmitry Kovalev
ff553ba113 Merge "Code cleanup." into experimental 2013-03-11 17:22:21 -07:00
Dmitry Kovalev
2891d70b23 Code cleanup.
Removing redundant code, introducing new functions for better
decomposition, adding 'clamp' function to vp9_common.h.

Change-Id: Ic3b8ca13bbc38f60f0c9c43910b5802005e31aaf
2013-03-11 17:02:27 -07:00
John Koleszar
a07eb47b25 Merge "Reinitialize motion search tables on frame size change" into experimental 2013-03-11 16:32:03 -07:00
John Koleszar
0a18228274 Merge "Add 'superframe' index" into experimental 2013-03-11 16:31:48 -07:00
Paul Wilkins
08d2c3829a Changes to maximum gf/arf interval.
This patch puts in an adjustment to the maximum gf/arf
interval based on the active q range.  It sets a fixed
baseline maximum of 16 but can drop this down to 12 at
lower q. This required some re-ordering in the first pass
code to insure we have a Q range estimate before defining
the first gf sequence.

The main gains seed are int he STD hd set on 50fps clips
where previously the interval could rise as high as 25.
On the std hd clip the gains are around 2.8% with limit set
to 300 frames.

When combined with the one shot rate control flags we get
combined of:

derf 1.55% (limit300), yt 7.25%, hd 5.17% std-hd 5.84% (limit300)

Change-Id: Ib380d51354511f2ff0f171a8df4e74291c0421f9
2013-03-11 19:25:10 +00:00
John Koleszar
9b4095c537 Fix vp9_tree_probs_from_distribution with CONFIG_CODE_NONZEROCOUNT
The automatic merge result was incomplete.

Change-Id: I8976318bfc346d867660a013a302c80edb25fc29
2013-03-11 11:03:36 -07:00
John Koleszar
52fc4f8a78 Merge "Simplify vp9_adapt_nmv_probs" into experimental 2013-03-11 09:57:53 -07:00
John Koleszar
ee4649ded2 Simplify vp9_adapt_nmv_probs
Remove the temporary branch count arrays and build the adapted probabilities
while walking the tree. Gives an additional 1.5% or so on CIF.

Change-Id: I875d61e5e0ec778e5d2f7f9d0837b989a91cf3a3
2013-03-11 09:44:22 -07:00
Deb Mukherjee
fad43d4249 Merge "Minor optimization in mv entropy adaptation" into experimental 2013-03-11 09:43:54 -07:00
John Koleszar
e6257342b1 Merge "Optimize vp9_tree_probs_from_distribution" into experimental 2013-03-11 09:32:11 -07:00
Deb Mukherjee
f74c55eb03 Minor optimization in mv entropy adaptation
Adds a check to exit from the increment_nmv_count function when the
increment is 0.

Change-Id: I99c1e342d351f7800e23590f9c2419881bf1d708
2013-03-11 08:49:14 -07:00
John Koleszar
bd84685f78 Optimize vp9_tree_probs_from_distribution
The previous implementation visited each node in the tree multiple times
because it used each symbol's encoding to revisit the branches taken and
increment its count. Instead, we can traverse the tree depth first and
calculate the probabilities and branch counts as we walk back up. The
complexity goes from somewhere between O(nlogn) and O(n^2) (depending on
how balanced the tree is) to O(n).

Only tested one clip (256kbps, CIF), saw 13% decoding perf improvement.

Note that this optimization should port trivially to VP8 as well. In VP8,
the decoder doesn't use this function, but it does routinely show up
on the profile for realtime encoding.

Change-Id: I4f2848e4f41dc9a7694f73f3e75034bce08d1b12
2013-03-10 13:39:30 -07:00
Deb Mukherjee
a28139c849 Continued experiment with nonzero count
Adds probability updates for extra bits for the nzcs, code for
getting nzc stats, plus some minor cleanups and fixes.

Change-Id: If2814e7f04fb52f5025ad9f400f3e6c50a00b543
2013-03-08 16:37:08 -08:00
Ronald S. Bultje
0643c3f133 Merge "Add support for tx_select in i8x8 encoding in keyframes." into experimental 2013-03-08 16:25:27 -08:00
Yunqing Wang
cb7acbc0e1 Merge "Add vp9_idct4_1d_sse2" into experimental 2013-03-08 15:14:02 -08:00
Yunqing Wang
11ca81f8b6 Add vp9_idct4_1d_sse2
Added SSE2 idct4_1d which is called by vp9_short_iht4x4. Also,
modified the parameter type passed to vp9_short_iht functions to
make it work with rtcd prototype.

Change-Id: I81ba7cb4db6738f1923383b52a06deb760923ffe
2013-03-08 15:04:22 -08:00
Dmitry Kovalev
3edbc77ae3 Merge "Consistent usage of ROUND_POWER_OF_TWO macro." into experimental 2013-03-08 11:35:22 -08:00
Yunqing Wang
2e0553227e Merge "Optimize add_constant_residual function" into experimental 2013-03-08 10:18:52 -08:00
Jingning Han
2a5278bdbd Extend diff MV limit from +/-256 to +/-1024
Increase the motion search range by 4x. Change MV_CLASS tree of the
entropy coding to allow two additional mv classes to cover the
extended motion vector limit. The codec determines the effective
motion search range conditioned on the actual frame dimension.

It provides coding gains:

stdhd 0.39%
yt    0.56%
hd    0.47%

Major coding performance gains are packed in several sequences with
intense motion activities, e.g., ped_1080p gains 7% at high bit-rates,
and on average 3%.

TODO: Need to further tune the rate control and motion search units.

Change-Id: Ib842540a6796fbee5a797809433ef6a477c6d78d
2013-03-08 10:04:36 -08:00
Ronald S. Bultje
b41dee8428 Add support for tx_select in i8x8 encoding in keyframes.
Also enable tx_select for keyframes.

Change-Id: Iadb1231d9fa7af0c8dce3d9b41830b93a302479e
2013-03-08 09:28:46 -08:00
Yunqing Wang
f240782650 Optimize add_constant_residual function
Optimized adding constant diff to predictor, which gave about
2% decoder performance gain.

Change-Id: I47db20c31428e8c4a8f16214a85cbe386a6e9303
2013-03-07 15:49:07 -08:00
Yunqing Wang
6fdd4d26de Merge "Allocate 16-byte aligned diff buffer" into experimental 2013-03-07 15:40:38 -08:00
Yunqing Wang
b339aea675 Allocate 16-byte aligned diff buffer
This was done based on John's suggestion.

Change-Id: I62516a513c31fe3dbea0d6cd063df79d9e819ec8
2013-03-07 15:29:27 -08:00
Dmitry Kovalev
3603dfb62c Consistent usage of ROUND_POWER_OF_TWO macro.
Change-Id: I44660975e9985310d8c654c158ee7a61291b5a08
2013-03-07 12:24:35 -08:00
Ronald S. Bultje
89e4ce20d0 Update ADST selection if tx_size < block_size.
Change-Id: Ic9b336486774c95ffbb92adcb110cc0fc2a83cc5
2013-03-07 11:19:15 -08:00
Ronald S. Bultje
d3724abe9f Re-add support for ADST in superblocks.
This also changes the RD search to take account of the correct block
index when searching (this is required for ADST positioning to work
correctly in combination with tx_select).

Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
2013-03-07 11:19:10 -08:00
Yunqing Wang
3162371544 Fix issue in add_residual intrinsic function
Yaowu found this function had a compiling issue with MSVC because
of using _mm_storel_pi((__m64 *)(dest + 0 * stride), (__m128)p0).
To be safe, changed back to use integer store instruction.

Also, for some build, diff could not always be 16-byte aligned.
Changed that in the code.

Change-Id: I9995e5446af15dad18f3c5c0bad1ae68abef6c0d
2013-03-07 09:22:27 -08:00
Deb Mukherjee
eb6ef2417f Coding con-zero count rather than EOB for coeffs
This patch revamps the entropy coding of coefficients to code first
a non-zero count per coded block and correspondingly remove the EOB
token from the token set.

STATUS:
Main encode/decode code achieving encode/decode sync - done.
Forward and backward probability updates to the nzcs - done.
Rd costing updates for nzcs - done.
Note: The dynamic progrmaming apporach used in trellis quantization
is not exactly compatible with nzcs. A suboptimal approach has been
used instead where branch costs are updated to account for changes
in the nzcs.

TODO:
Training the default probs/counts for nzcs

Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
2013-03-07 07:20:30 -08:00
Dmitry Kovalev
a9961fa819 Merge "Code cleanup." into experimental 2013-03-06 16:57:34 -08:00
Paul Wilkins
72a6201050 Merge "Added stricter Q control flag." into experimental 2013-03-06 04:32:22 -08:00
Paul Wilkins
db6ad0138c Added stricter Q control flag.
Added a variant of the one shot maxQ flag
for two pass that forces a fixed Q for the
normal inter frames. Disabled by default.
Also small adjustment to the Bits per MB
estimation.
Change-Id: I87efdfb2d094fe1340ca9ddae37470d7b278c8b8
2013-03-06 12:05:49 +00:00
Yunqing Wang
f4e383f3d1 Merge "Optimize add_residual function" into experimental 2013-03-05 16:47:58 -08:00
Yunqing Wang
943c6d7172 Optimize add_residual function
Optimized adding diff to predictor, which gave 0.8% decoder
performance gain.

Change-Id: Ic920f0baa8cbd13a73fa77b7f9da83b58749f0f8
2013-03-05 16:27:45 -08:00
Dmitry Kovalev
7f99c3c59a Code cleanup.
Removing redundant 'extern' keywords, fixing formatting and #include order,
code simplification.

Change-Id: I0e5fdc8009010f3f885f13b5d76859b9da511758
2013-03-05 14:12:16 -08:00
John Koleszar
522d4bf852 Add 'superframe' index
A 'superframe' is a group of frames that share the same PTS, but have a
defined decoding order. This commit adds the ability to append an index
to such a group of frames, allowing for random access to the constituent
frames. This could be useful for frame-level parallelism or partial
decoding in a multilayer scenario.

Decoding the stream serially without such an index should work as a
fallback, and VP9/TestSuperframeIndexIsOptional verifies that.

Change-Id: Idff83b7560e1a7077d8fb067bfbc45b567e78b1c
2013-03-05 12:45:40 -08:00
Ronald S. Bultje
4209bba462 Merge changes Ifacbf5a0,Ibad7c3dd into experimental
* changes:
  vpxenc: actually report mismatch on stderr.
  Make superblocks independent of macroblock code and data.
2013-03-05 11:17:14 -08:00
Dmitry Kovalev
764be4f66f Merge "Code cleanup and simplification of build_4x4uvmvs function." into experimental 2013-03-04 16:57:30 -08:00
Ronald S. Bultje
111ca42133 Make superblocks independent of macroblock code and data.
Split macroblock and superblock tokenization and detokenization
functions and coefficient-related data structs so that the bitstream
layout and related code of superblock coefficients looks less like it's
a hack to fit macroblocks in superblocks.

In addition, unify chroma transform size selection from luma transform
size (i.e. always use the same size, as long as it fits the predictor);
in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
transform will now use the 16x16 (instead of the 8x8) chroma transform,
and 64x64 superblocks using the 32x32 luma transform will now use the
32x32 (instead of the 16x16) chroma transform.

Lastly, add a trellis optimize function for 32x32 transform blocks.

HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
a few negative points here and there that I might want to analyze
a little closer.

Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
2013-03-04 16:34:36 -08:00
John Koleszar
daa9b29ea1 Reinitialize motion search tables on frame size change
Make sure the motion search is done with the offsets calculated from
the correct stride.

Change-Id: Ifbcc0f742eda3399c255bfcfa1cdee9a4bb4b4e7
2013-03-04 16:00:01 -08:00
Dmitry Kovalev
49b697d327 Merge "Code cleanup." into experimental 2013-03-04 15:41:15 -08:00
Yunqing Wang
37932d9168 Merge "Optimize vp9_short_idct4x4llm function" into experimental 2013-03-04 14:13:31 -08:00
Yunqing Wang
e8bc9f4220 Optimize vp9_short_idct4x4llm function
Wrote a SSE2 vp9_short_idct4x4llm to improve the decoder
performance.

Change-Id: I90b9d48c4bf37aaf47995bffe7e584e6d4a2c000
2013-03-04 12:01:27 -08:00
Jingning Han
5957b2b514 Support 16K sequence coding
Fixed a couple of variable/function definitions, as well as header
handling to support 16K sequence coding at high bit-rates.

The width and height are each specified by two bytes in the header.
Use an extra byte to explicitly indicate the scaling factors in
both directions, each ranging from 0 to 15.

Tested coding up to 16400x16400 dimension.

Change-Id: Ibc2225c6036620270f2c0cf5172d1760aaec10ec
2013-03-04 11:08:41 -08:00
John Koleszar
1cfc86ebe0 Add unit test for x4 multi-SAD functions
Update the function prototypes to match between VP9 and VP8.

Change-Id: If58965073989e87df3b62b67a030ec6ce23ca04f
2013-03-01 18:14:02 -08:00
Dmitry Kovalev
b5a9795d25 Code cleanup and simplification of build_4x4uvmvs function.
Change-Id: Iab0176f058045181821ded95ff1cf423af1625f9
2013-03-01 17:50:55 -08:00
Dmitry Kovalev
135428e954 Code cleanup.
Removing redundant 'extern' keyword, lowercase variable names.

Change-Id: I608e8d8579aba8981f5fac3493f77b4481b13808
2013-03-01 17:39:31 -08:00
John Koleszar
69c67c9531 Merge master branch into experimental
Picks up some build system changes, compiler warning fixes, etc.

Change-Id: I2712f99e653502818a101a72696ad54018152d4e
2013-03-01 11:06:05 -08:00
Yaowu Xu
db4dc6f0c0 Merge "Adjust the max_gf_interval initialization" into experimental 2013-03-01 11:02:23 -08:00
Yunqing Wang
67dbc8fe55 Merge "Add eob<=10 case in idct32x32" into experimental 2013-03-01 08:58:19 -08:00
Yaowu Xu
cea8cd08d3 Adjust the max_gf_interval initialization
to be a fixed value of 15.

Test results:
cif:  .124%, .068%, .081%
std-hd: 2.809%, 3.174%, 2.705%

Change-Id: I380c8152c973506094da15eab59e3aa22b75a983
2013-03-01 06:38:35 -08:00
Dmitry Kovalev
852ca19e4b Merge "Code cleanup." into experimental 2013-02-28 17:22:51 -08:00
Yunqing Wang
c550bb3b09 Add eob<=10 case in idct32x32
Simplified idct32x32 calculation when there are only 10 or less
non-zero coefficients in 32x32 block. This helps the decoder
performance.

Change-Id: If7f8893d27b64a9892b4b2621a37fdf4ac0c2a6d
2013-02-28 16:40:29 -08:00
Dmitry Kovalev
253886413a Merge changes I9be9c990,Ic3b97339 into experimental
* changes:
  Ignoring test video sequences in the source tree.
  Code cleanup.
2013-02-28 16:07:45 -08:00
James Zern
a07bed2b2b firstpass.c: correct casting around gf_group_bits
gf_group_bits is int64_t remove casts to int.

Change-Id: I3b4225905041fac9af9fdfcbcb6f1c357ea4b593
2013-02-28 15:45:29 -08:00
John Koleszar
17c221687f Merge "Fix use of uninitialized memory in CONFIG_ABOVESPREFMV" into experimental 2013-02-28 15:18:50 -08:00
Jim Bankoski
078f5bf439 Merge "mv dct_sse2.c dct_sse2_intrinsics.c to avoid collision" into experimental 2013-02-28 15:16:44 -08:00
Dmitry Kovalev
dcbdda8e15 Code cleanup.
Lower case variable names, converting while loops to for loops.

Change-Id: Ic3b973391eef7472a99d18d02fe79cfef5e04e62
2013-02-28 14:40:20 -08:00
Yunqing Wang
72b146690a Merge "Refactor vp9_dequant_idct_add function" into experimental 2013-02-28 14:34:27 -08:00
Yunqing Wang
6193bc3ba8 Refactor vp9_dequant_idct_add function
Provided a wrapper and removed duplicate code.

Change-Id: Iaef842226ec348422e459202793b001d0983ea30
2013-02-28 14:18:46 -08:00
Scott LaVarnway
aa8fb070b8 Removed vp9_dequantize_b
Change-Id: Ie89bd00d58e30bf4094cb748a282f1dfa81a31d8
2013-02-28 14:08:12 -08:00
Jim Bankoski
8f270acfb2 mv dct_sse2.c dct_sse2_intrinsics.c to avoid collision
Change-Id: Id786be31da3c91d95d2955aa569ecdc6e66650df
2013-02-28 13:58:15 -08:00
John Koleszar
2eab4372fc Fix use of uninitialized memory in CONFIG_ABOVESPREFMV
The ABOVESPREFMV experiment uses four pixels to the left of the
current block, which don't exist for the left-most column.

Change-Id: I4cf0b42ae8f54c0b3e7b1ed8755704b74fafc39c
2013-02-28 13:48:58 -08:00
Dmitry Kovalev
40fec9b588 Merge "Dequantization code cleanup." into experimental 2013-02-28 13:46:43 -08:00
Dmitry Kovalev
c43906e2e9 Dequantization code cleanup.
Removing redundant variables, using x *= y instead x = x * y, moving
variable declarations into inner blocks.

Change-Id: I884f95c755f55d51b7c1c6585f10296919063e41
2013-02-28 13:28:05 -08:00
Dmitry Kovalev
0d9cc0a9f0 Code cleanup.
Removing redundant 'extern' keyword, better formatting, code
simplification.

Change-Id: I132fea14f08c706ee9ea147d19464d03f833f25b
2013-02-28 13:18:02 -08:00
John Koleszar
b6a3062d81 Fix incorrect comparison of frame size
The width and height stored in the reference frames are padded out to
a multiple of 16. The Width and Height variables in common are the
displayed size, which may be smaller. The incorrect comparison was
causing scaling related code to be called when it shouldn't have
been. A notable case where this happens is 1080p, since 1088 != 1080.

Change-Id: I55f743eeeeaefbf2e777e193bc9a77ff726e16b5
2013-02-28 11:33:02 -08:00
Jim Bankoski
714aa9f3c0 this commit converts all sad ptrs to uint32
sse4_1 code used uint16_t for returning sad, but that
won't work for 32x32 or 64x64.   This code fixes the
assembly for those and also reenables sse4_1 on linux

Change-Id: I5ce7288d581db870a148e5f7c5092826f59edd81
2013-02-28 08:46:35 -08:00
Jim Bankoski
b715e371c0 fix to parameters to match rtcd
Change-Id: I919e2dd72292fe44f2e53ada56bd42287d50cdeb
Signed-off-by: Jim Bankoski <jimbankoski@google.com>
2013-02-28 08:10:08 -08:00
Christian Duvivier
c129203f7e Faster vp9_short_fdct8x8.
Scalar path is about 1.4x faster (4% overall encoder speedup).
SSE2 path is about 7x faster (13% overall encoder speedup).

Change-Id: I7e85d8225a914a74c61ea370210414696560094d
2013-02-27 17:23:08 -08:00
Dmitry Kovalev
347f3a0aa8 Code cleanup.
Fixing code style, using array lookup instead of switch statements for
forward hybrid transforms (in the same way as for their inverses).
Consistent usage of ROUND_POWER_OF_TWO macro in appropriate places.

Change-Id: I0d3822ae11f928905fdbfbe4158f91d97c71015f
2013-02-27 13:51:04 -08:00
Dmitry Kovalev
9d771f948f Merge "Motion vectors code cleanup." into experimental 2013-02-27 13:34:56 -08:00
Yunqing Wang
bbc7b6a86a Merge "Remove unused file" into experimental 2013-02-27 13:00:10 -08:00
John Koleszar
5ac141187a Merge "Remove unused vp9_copy32xn" into experimental 2013-02-27 12:23:45 -08:00
Yunqing Wang
d6ff6fe2ed Merge "Remove unused file" into experimental 2013-02-27 11:58:29 -08:00
Dmitry Kovalev
0c0de00217 Motion vectors code cleanup.
Fixing indentation, removing redundant parenthesis, deciphering single
letter variable names, better spacing.

Change-Id: I1d447a7d69eddbf1e94e0820423615f40ea2d591
2013-02-27 11:48:13 -08:00
Ronald S. Bultje
90932399b4 Merge "Move eob from BLOCKD to MACROBLOCKD." into experimental 2013-02-27 11:39:16 -08:00
Yunqing Wang
8092aaf9ec Merge "Optimize vp9_dc_only_idct_add_c function" into experimental 2013-02-27 11:38:45 -08:00
John Koleszar
09be534f13 Merge "give vp9 variance struct a unique name" 2013-02-27 11:22:36 -08:00
Yunqing Wang
bf6cca44ad Remove unused file
Removed vp9/decoder/x86/vp9_idct_blk_mmx.c

Change-Id: I07ab06382a394cf556fa5a8e3c98b91f6e4f9ce8
2013-02-27 11:13:19 -08:00
Yunqing Wang
5ef694cfb8 Remove unused file
Removed vp9_idctllm_mmx.asm

Change-Id: I7152756f23a5a09ed69e8fb40edb2ab3237290fe
2013-02-27 11:00:58 -08:00
Ronald S. Bultje
e8c74e2b70 Move eob from BLOCKD to MACROBLOCKD.
Consistent with VP8.

Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07
2013-02-27 11:00:55 -08:00
John Koleszar
0921bfb749 Merge "Use ref_frame_map vice active_ref_idx on the encoder" into experimental 2013-02-27 10:59:08 -08:00
John Koleszar
9615fd8f39 Merge "Test upscaling as well as downscaling" into experimental 2013-02-27 10:25:51 -08:00
John Koleszar
7ad8dbe417 Remove unused vp9_copy32xn
This function was part of an optimization used in VP8 that required
caching two macroblocks. This is unused in VP9, and might not
survive refactoring to support superblocks, so removing it for now.

Change-Id: I744e585206ccc1ef9a402665c33863fc9fb46f0d
2013-02-27 10:24:56 -08:00
John Koleszar
d8e68bd14b Merge changes I922f8602,I0ac3343d into experimental
* changes:
  Use 256-byte aligned filter tables
  Set scale factors consistently for SPLITMV
2013-02-27 10:08:53 -08:00
Jan Kratochvil
82ed3f9a41 Fix --as=nasm compatibility for new asm code.
s/movd/movq/

Change-Id: Id1a56de91551f8dc796f14f1056c565dfc1ba626
2013-02-27 09:55:38 -08:00
John Koleszar
350ba5f30e Merge "Combined motion compensation with scaled predictors" into experimental 2013-02-27 09:46:12 -08:00
John Koleszar
800ad0b886 Use ref_frame_map vice active_ref_idx on the encoder
This patch makes the encoder's use of ref_frame_map and active_ref_idx
consistent with the decoder. ref_frame_map[] maps a reference buffer
index to its actual location in the yv12_fb array, since many
references may share an underlying buffer. active_ref_idx[] mirrors
cpi->{lst,gld,alt}_fb_idx, holding the active references in each
slot.

This also fixes a bug in setup_buffer_inter() where the incorrect
reference was used to populate the scaling factors.

Change-Id: Id3728f6d77cffcd27c248903bf51f9c3e594287e
2013-02-27 08:22:40 -08:00
John Koleszar
b683eecf6d Test upscaling as well as downscaling
Fixes a bug in vp9_set_internal_size() that prevented returning to
the unscaled state. Updated the ResizeInternalTest to scale both
down and up. Added a check that all frames are within 2.5% of the
quality of the initial keyframe.

Change-Id: I3b7ef17cdac144ed05b9148dce6badfa75cff5c8
2013-02-27 08:22:40 -08:00
John Koleszar
6fd7dd1a70 Use 256-byte aligned filter tables
This avoids duplicating all the filters twice. Includes fixups to the
convolve routines and associated tests to make this work.

Change-Id: I922f86021594e55072ddb63b42b2313605db6e00
2013-02-27 08:22:39 -08:00
John Koleszar
77f88e97fa Combined motion compensation with scaled predictors
This patch extends the previous support for using references of a
different resolution in ZEROMV mode to all inter prediction modes.
Subpixel based best-mv scoring is disabled when the reference frame
differs in resolution from the current frame.

Change-Id: Id4dc3e5e6692de98d9857fd56bfad3ac57e944ac
2013-02-27 08:22:39 -08:00
John Koleszar
472eeaf082 Set scale factors consistently for SPLITMV
This commit updates the 4x4 prediction to consistently use the
build_2x1_inter_predictor() method. That function is updated to
calculate the scale offset, rather than relying on the caller
to calculate it. In the case that the 2x1 prediction can not
be used, the scale offset is recalculated for each 1x1 block.
The idea here is that the offsets are calculated before each
call to vp9_build_scaled_inter_predictor().

Change-Id: I0ac3343dd54e2846efa3c4195fcd328b709ca04d
2013-02-27 08:22:39 -08:00
Yaowu Xu
858b60e8d0 Merge "Improve 32x32 forward dct" into experimental 2013-02-27 07:56:42 -08:00
John Koleszar
eb939f45b8 Spatial resamping of ZEROMV predictors
This patch allows coding frames using references of different
resolution, in ZEROMV mode. For compound prediction, either
reference may be scaled.

To test, I use the resize_test and enable WRITE_RECON_BUFFER
in vp9_onyxd_if.c. It's also useful to apply this patch to
test/i420_video_source.h:

  --- a/test/i420_video_source.h
  +++ b/test/i420_video_source.h
  @@ -93,6 +93,7 @@ class I420VideoSource : public VideoSource {

     virtual void FillFrame() {
       // Read a frame from input_file.
  +    if (frame_ != 3)
       if (fread(img_->img_data, raw_sz_, 1, input_file_) == 0) {
         limit_ = frame_;
       }

This forces the frame that the resolution changes on to be coded
with no motion, only scaling, and improves the quality of the
result.

Change-Id: I1ee75d19a437ff801192f767fd02a36bcbd1d496
2013-02-26 23:54:23 -08:00
Dmitry Kovalev
c7805395fd Merge "Removing redundant 'extern' keyword from function declarations." into experimental 2013-02-26 20:56:32 -08:00
Ronald S. Bultje
96d260515a Merge "Merge cnvcontext experiment." into experimental 2013-02-26 19:39:39 -08:00
Ronald S. Bultje
1a0533958b Merge "Fix modes.stt output printf format string." into experimental 2013-02-26 19:39:33 -08:00
Ronald S. Bultje
db54e6774f Merge "Minor cosmetics in rdopt." into experimental 2013-02-26 19:39:28 -08:00
Yunqing Wang
35bc02c6eb Optimize vp9_dc_only_idct_add_c function
Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to
improve performance, clipped the absolute diff values to [0, 255].
This allowed us to keep the additions/subtractions in 8 bits.
Test showed an over 2% decoder performance increase.

Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d
2013-02-26 17:16:13 -08:00
James Zern
4446af78f0 Merge "vp9: promote gf_group_bits calculation to 64-bit" into experimental 2013-02-26 16:27:45 -08:00
Dmitry Kovalev
971ff2679f Removing redundant 'extern' keyword from function declarations.
Change-Id: I893fa36297b9bd9cff93d082f1736f6860b15c0d
2013-02-26 15:52:05 -08:00
John Koleszar
25686fc22d Merge "Refactor inter recon functions to support scaling" into experimental 2013-02-26 11:45:28 -08:00
Dmitry Kovalev
998bed1d2c Merge "Changing pitch value meaning for fht and iht transforms." into experimental 2013-02-26 10:44:15 -08:00
Ronald S. Bultje
b1641150b1 Merge cnvcontext experiment.
Change-Id: I35e64998b25694a3bb4a62164bba3c03c1db4bc7
2013-02-26 10:40:15 -08:00
Ronald S. Bultje
f3fdb4c37d Fix modes.stt output printf format string.
Change-Id: I17e2d2f6a4da86d9e4af7bebdea0bf5d154da084
2013-02-26 10:40:15 -08:00
Ronald S. Bultje
71539eae2a Minor cosmetics in rdopt.
Change-Id: I62497dcf2074b4bb4787bf660e727e5cf1bf3472
2013-02-26 10:40:11 -08:00
Ronald S. Bultje
c4ae97911a Merge "make cost_coeffs to use combined context" into experimental 2013-02-26 10:32:01 -08:00
John Koleszar
6a4f708c25 Refactor inter recon functions to support scaling
Ensure that all inter prediction goes through a common code path
that takes scaling into account. Removes a bunch of duplicate
1st/2nd predictor code. Also introduces a 16x8 mode for 8x8
MVs, similar to the 8x4 trick we were doing before. This has an
unexpected effect with EIGHTTAP_SMOOTH, so it's disabled in that
case for now.

Change-Id: Ia053e823a8bc616a988a0af30452e1e75a739cba
2013-02-26 10:03:29 -08:00
Yaowu Xu
66d94ac13c Improve 32x32 forward dct
The commit improves the 32x32 forward dct implementation:
1. change to use same constants and rounding as other forward dcts
2. select rounding to specifically minimize the roundtrip error, which
improved average 19/block to .77/block using 100000 random input.

Test showed a small but consistent gain on all test sets, about .15%

Change-Id: If0afd6a71880a522f60c1c234be0462092c2eb53
2013-02-26 09:23:01 -08:00
Dmitry Kovalev
9bf3f75168 Changing pitch value meaning for fht and iht transforms.
Pitch now means the number of elements, not the number of bytes.

Change-Id: Idb9f2f012e39b09d596a3cc1802305a80b7c13af
2013-02-25 18:19:55 -08:00
Yaowu Xu
ecb03e9a3f make cost_coeffs to use combined context
Change-Id: Ia15f4244595fab49bffda0c651a750a8a9481d28
2013-02-25 17:01:33 -08:00
Dmitry Kovalev
9770d564f4 Code cleanup.
Removing switch statements for inverse hybrid transforms. Making code style
consistent for all similar transform implementations. Renaming shortpitch
and short_pitch variables to half_pitch.

Change-Id: I875f7a82aae4e8063a58777bf1cc3f1e67b48582
2013-02-25 15:14:01 -08:00
Dmitry Kovalev
3171b69dee Merge "Code cleanup." into experimental 2013-02-25 14:14:22 -08:00
Dmitry Kovalev
0287d20a05 Merge "Code cleanup." into experimental 2013-02-25 13:58:06 -08:00
Jingning Han
e7b67d33a9 Merge "Improving the forward 16x16 ADST/DCT accuracy" into experimental 2013-02-25 13:38:33 -08:00
Dmitry Kovalev
20b0cb599b Code cleanup.
Removing redundant parentheses, better code formatting, introducing
ROUND_POWER_OF_TWO macro to replace repeated expression.

Change-Id: I91aad7a53ed03482428b2419de4bb99fd92c6771
2013-02-25 13:38:18 -08:00
Dmitry Kovalev
ab196b7e9b Code cleanup.
Lower case names of variables. Removing redundant spaces, parentheses,
casts, and variables.

Change-Id: I55b80c55b7d5adca44c1e8adb40a124c0680f229
2013-02-25 13:33:56 -08:00
James Zern
b2fc3ca066 vp9: promote gf_group_bits calculation to 64-bit
avoids signed integer overflow

Change-Id: I9ffcdba90b21edb324d1b173fd11d613e0592931
2013-02-25 13:00:18 -08:00
Paul Wilkins
0e36158c70 Merge "Minor rate control refactoring and experiments." into experimental 2013-02-25 12:49:54 -08:00
Jingning Han
65821d6680 Improving the forward 16x16 ADST/DCT accuracy
Increase the first stage dynamic range by 4 times, and reduce it
back with proper rounding before applying the second stage. Hence
it still fits in the given dynamic range and slightly improves
the key frame coding performance.

Change-Id: Ia4c5907446f20a95dc3de079c314b3ad1221d8aa
2013-02-25 12:13:37 -08:00
Jingning Han
77a3becf92 clean up forward and inverse hybrid transform
Rebased.

Remove the old matrix multiplication transform computation. The 16x16
ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16
300/0 in vp9/common/vp9_blockd.h.

Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f
2013-02-25 09:16:12 -08:00
Paul Wilkins
97da8b8c33 Minor rate control refactoring and experiments.
Some minor refactoring code relating to estimates of
bits per MB at a given Q and estimating the allowed Q range.

Most of the changes here were included in a previous commit.
This commit seeks to separate out the refactoring from more
the material changes.

Two #define control flags have been added for experimentation.

ONE_SHOT_Q_ESTIMATE force the two pass encoder to
use its initial Q range estimate for the whole clip even if this results
in a miss on the target data rate. In effect this tightens the Q range
seen at the expense of rate control accuracy.

DISABLE_RC_LONG_TERM_MEM is a related flag that disables the
long term memory in the rate control. Local adjustments are still
made to try and better hit the rate target on a per frame basis but
the impact of rate control misses is not propagated to the remainder
of the clip. This means that for example an overshoot early on will not
cause frames later in the clip to be starved of bits. Again the result
of this relaxation amy be less rate control accuracy especially on short
clips.

The flags are disabled by default for now.

Change-Id: I7482f980146d8ea033b5d50cc689f772e4bd119e
2013-02-25 17:07:45 +00:00
Yaowu Xu
499fe05dc0 optimize forward 16x16 DCT for accuracy
This commit added pre/post scaling for first half of fDCT16x16 to
reduce error, by simulation of 100,000 blocks for random inputs,
the average sse reduced from 2.1/block to 0.0498/block.

also enabled tests for 16x16 fDCT and iDCT

Change-Id: Id2a95f0464c6dd4118797d456237ae90274c0f02
2013-02-25 07:47:27 -08:00
Ronald S. Bultje
0c9e2e9a1d Split coefficient token tables intra vs. inter.
Change-Id: I5416455f8f129ca0f450d00e48358d2012605072
2013-02-23 07:33:46 -08:00
Paul Wilkins
c17672a33d Further changes to coefficient contexts.
This patch alters the balance of context between the
coefficient bands (reflecting the position of coefficients
within a transform blocks) and the energy of the previous
token (or tokens) within a block.

In this case the number of coefficient bands is reduced
but more previous token energy bands are supported.

Some initial rebalancing of the default tables has been
by running multiple derf clips at multiple data rates using
the ENTOPY_STATS macro. Further balancing needs to be
done using larger image formatsd especially in regard to
the bigger transform sizes which are not as well represented
in encodings of smaller image formats.

Change-Id: If9736e95c391e711b04aef6393d26f60f36e1f8a
2013-02-23 07:29:09 -08:00
Yaowu Xu
bf0570a7e6 Merge "optimize 8x8 fdct rounding for accuracy" into experimental 2013-02-22 22:20:57 -08:00
Yaowu Xu
22012ee994 optimize 8x8 fdct rounding for accuracy
The commit added a final rounding choice for 8x8 forward dct to get
rid of a sign bias at DC position and improve the accuracry in term
of round trip error for 8x8 fDCT/iDCT.

This commit also enabled forward 8x8 dct test.

Change-Id: Ib67f99b0a24d513e230c7812bc04569d472fdc50
2013-02-22 16:55:30 -08:00
James Zern
e5fb6321a1 give vp9 variance struct a unique name
variance_vtable clashed with vp8/common/variance.h

Change-Id: I09c1de44d5519f1bd13f58c01144c0de4706de6f
2013-02-22 16:25:13 -08:00
James Zern
c21226b638 Merge "vp8: make gf_group_bits 64-bit" 2013-02-22 15:31:28 -08:00
James Zern
5e0724abad Merge "vp8_first_pass(): avoid floating point div by 0" 2013-02-22 15:30:14 -08:00
James Zern
4e00060d29 vp8: make gf_group_bits 64-bit
avoids signed integer overflow; matches kf_group_bits

Change-Id: I193145cdc4fa53e70fba0a1731a03eb1a574931d
2013-02-22 12:45:28 -08:00
James Zern
fba9772dd2 vp8_first_pass(): avoid floating point div by 0
Change-Id: Id1e6a12db6b0c1d3f64ead8fd8834aadc30fbed2
2013-02-22 12:41:59 -08:00
Jingning Han
936aa281b5 Fixed the buffer overflow issue
The issue that potentially broke the encoding process was due to the fact
that the length of token link is calculated from the total number of tokens
coded, while it is possible, in high bit-rate setting, this length is
greater than the buffer length initially assigned to the cpi->tok.

This patch increases the initially allocated buffer length assigned to
cpi->tok from
(mb_rows * mb_cols * 24 * 16) to (mb_rows * mb_cols * (1 + 24 * 16)).

It resolves the buffer overflow problem.

Change-Id: I8661a8d39ea0a3c24303e3f71a170787a1d5b1df
2013-02-22 12:30:35 -08:00
John Koleszar
606a2561d6 Merge "Code cleanup." into experimental 2013-02-22 11:20:20 -08:00
Dmitry Kovalev
548b4dd5f2 Code cleanup.
Removing redundant 'extern' keywords and parentheses, fixing indentation,
making variable names lower case, using short expressions x *= c
instead of x = x * c, minor code simplifications.

Change-Id: If6a25fcf306d1db26e90d27e3c24a32735c607de
2013-02-22 11:03:14 -08:00
Jingning Han
c67a20994f Merge "Forward butterfly hybrid transform" into experimental 2013-02-22 09:20:26 -08:00
Paul Wilkins
b5f3cb6e37 Merge "Experimental removal of over quant code" into experimental 2013-02-22 08:44:40 -08:00
Paul Wilkins
dbf4942046 Experimental removal of over quant code
The over quant code was added in VP8 post
bitstream freeze to allow compression to lower
data rates

In VP9 the real qualtizer range has been greatly
extended anyway.

Change-Id: I5d384fa5e9a83ef75a3df34ee30627bd21901526
2013-02-22 14:00:51 +00:00
Jingning Han
babbd5d170 Forward butterfly hybrid transform
This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT
hybrid transform. The kernel of 4x4 ADST is sin((2k+1)*(n+1)/(2N+1)).
The kernel of 8x8/16x16 ADST is of the form sin((2k+1)*(2n+1)/4N).

Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1
2013-02-21 18:24:28 -08:00
Dmitry Kovalev
5a18106fb7 Code cleanup.
Removing redundant 'extern' keywords. Moving VP9DX_BOOL_DECODER from .h
to .c file.

Change-Id: I5a3056cb3d33db7ed3c3f4629675aa8e21014e66
2013-02-21 13:50:15 -08:00
Ronald S. Bultje
8c16dee4f2 Merge "Remove "eobs" array in MACROBLOCKD." into experimental 2013-02-21 11:30:29 -08:00
John Koleszar
4674312382 Merge "Code cleanup." into experimental 2013-02-21 10:56:17 -08:00
Dmitry Kovalev
5da8534963 Code cleanup.
Removing redundant 'extern' keyword from function declarations and making
function arguments lower case.

Change-Id: Idae9a2183b067f2b6c85ad84738d275e8bbff9d9
2013-02-21 10:34:33 -08:00
Ronald S. Bultje
35524e2231 Remove "eobs" array in MACROBLOCKD.
The information is a duplicate of "eob" in BLOCKD.

Change-Id: Ia6416273bd004611da801e4bfa6e2d328d6f02a3
2013-02-21 10:07:36 -08:00
Deb Mukherjee
048f593703 Merge "Refactoring of switchable filter search for speed" into experimental 2013-02-21 09:23:50 -08:00
John Koleszar
138ffb6ea9 Merge "Avoid division in intra prediction" into experimental 2013-02-21 08:33:17 -08:00
Deb Mukherjee
28b1db9278 Refactoring of switchable filter search for speed
Refactors the switchable filter search in the rd loop to
improve encode speed.

Uses a piecewise approximation to a closed form expression to estimate
rd cost for a Laplacian source with a given variance and quantization
step-size.

About 40% encode time reduction is achieved.

Results (on a feb 12 baseline) show a slight drop:

derf: -0.019%
yt: +0.010%
std-hd: -0.162%
hd: -0.050%

Change-Id: Ie861badf5bba1e3b1052e29a0ef1b7e256edbcd0
2013-02-20 18:34:42 -08:00
Jingning Han
abfd2a4880 Merge "Fixed the buffer overflow issue" into experimental 2013-02-20 16:27:27 -08:00
Jingning Han
232ccc2fbe Fixed the buffer overflow issue
The issue that potentially broke the encoding process was due to the fact
that the length of token link is calculated from the total number of tokens
coded, while it is possible, in high bit-rate setting, this length is
greater than the buffer length initially assigned to the cpi->tok.

This patch increases the initially allocated buffer length assigned to
cpi->tok from
(mb_rows * mb_cols * 24 * 16) to (mb_rows * mb_cols * (1 + 24 * 16)).

It resolves the buffer overflow problem.

Change-Id: I8661a8d39ea0a3c24303e3f71a170787a1d5b1df
2013-02-20 15:41:48 -08:00
Dmitry Kovalev
e6c89a1f9b Merge "Code cleanup." into experimental 2013-02-20 12:47:54 -08:00
Yaowu Xu
441f24de3d Merge "Merge lossless experiment" into experimental 2013-02-20 12:27:26 -08:00
Dmitry Kovalev
eb6aee50a4 Code cleanup.
Change-Id: I7c6e3bebd94856b24dbe2aded7f9e04ef8bb8c08
2013-02-20 11:36:31 -08:00
Yaowu Xu
d262e26cc7 Merge lossless experiment
Change-Id: I7b7b8d4fda3a23699e0c920d727f8c15d37d43aa
2013-02-20 07:54:28 -08:00
Paul Wilkins
ef01b956d8 Entropy stats output code.
Fixes to make Entropy stats code work again

Change-Id: I62e380481a4eb4c170076ac6ab36f0c2b203e914
2013-02-20 14:33:19 +00:00
Tero Rintaluoma
56e6c66b49 Avoid division in intra prediction
- Using multiplication and shifting instead of division in
  intra prediction.
- Maximum absolute difference is 1 for division statements
  in d45, d27, d63 prediction modes. However, errors can
  cumulate for large block sizes when using already predicted
  values.
- Maximum number of non-matching result values in loops using
  division are:
  4x4        0/16
  8x8        0/64
  16x16     10/256
  32x32     13/1024
  64x64    122/4096

  Overall PSNR
  derf:     0.005
  yt:      -0.022
  std-hd:   0.021
  hd:      -0.006

Change-Id: I3979a02eb6351636442c1af1e23d6c4e6ec1d01d
2013-02-20 10:37:36 +02:00
Yaowu Xu
6b1b341774 Merge "fixed an enc/dec mis-match issue" into experimental 2013-02-19 16:53:30 -08:00
Yaowu Xu
b13f38d4b3 fixed an enc/dec mis-match issue
The issue was caused by a out-of-order merge, which leads to wrong
functions are called at lossless mode.

Change-Id: If157729abab62954c729e0377e7f53edb7db22ca
2013-02-19 16:26:27 -08:00
Jingning Han
cd907b1601 16x16 butterfly inverse ADST/DCT hybrid transform
rebased.

This patch includes 16x16 butterfly inverse ADST/DCT hybrid
transform. It uses the variant ADST of kernel
    sin((2k+1)*(2n+1)/4N),
which allows a butterfly implementation.

The coding gains as compared to DCT 16x16 are about 0.1% for
both derf and std-hd. It is noteworthy that for std-hd sets
many sequences gains about 0.5%, some 0.2%. There are also few
points that provides -1% to -3% performance. Hence the average
goes to about 0.1%.

Change-Id: Ie80ac84cf403390f6e5d282caa58723739e5ec17
2013-02-19 09:07:00 -08:00
Ronald S. Bultje
ae81d3a03f Merge "Minor cosmetic cleanups." into experimental 2013-02-19 08:54:44 -08:00
Ronald S. Bultje
0694ea0ed6 Merge "Prevent filling transform size cache with uninitialized values." into experimental 2013-02-19 08:54:35 -08:00
Yaowu Xu
93d6b86cfd Use lossless for Q0
The commit changes the coding mode to lossless whenever the lowest
quantizer is choosen.

As expected, test results showed no difference for cif and std-hd
set where Q0 is rarely used. For yt and yt-hd set, Q0 is used for
a number of clips, where this commit helped a lot in the high end.

Average over all clips in the sets:
yt: 2.391% 1.017% 1.066%
hd: 1.937%  .764%  .787%

Change-Id: I9fa9df8646fd70cb09ffe9e4202b86b67da16765
2013-02-19 06:18:42 -08:00
Ronald S. Bultje
aa84c16da2 Minor cosmetic cleanups.
Change-Id: I13d8ae754827368755575dd699a087b3b11f5b16
2013-02-15 17:21:16 -08:00
Ronald S. Bultje
ebfdaa0e0b Prevent filling transform size cache with uninitialized values.
The 32x32 value in case of splitmv was uninitialized. this leads to
all kind of erratic behaviour down the line. Also fill in dummy values
for superblocks in keyframes (the values are currently unused, but we
run into integer overflows anyway, which makes detecting bad cases
harder). Lastly, in case we did not find any RD value at all, don't
set tx_diff to INT_MIN, but instead set it to zero (since if we couldn't
find a mode, it's unlikely that any particular transform would have made
that worse or better; rather, it's likely equally bad for all tx_sizes).

Change-Id: If236fd3aa2037e5b398d03f3b1978fbbc5ce740e
2013-02-15 17:21:16 -08:00
Ronald S. Bultje
4dfcb129fd Merge "Remove some unused structs and members from the decoder." into experimental 2013-02-15 17:11:38 -08:00
Ronald S. Bultje
5bb103c486 Merge "Remove Y2 and Y-no-DC token types from the bitstream." into experimental 2013-02-15 17:11:20 -08:00
Jingning Han
e343732a92 Fixed a subtle issue that breaks encoding process
This issue breaks the encoding process of the codebase. The effect
emerges only in particular test sequence at certain bit-rates and
frame limits.

Change-Id: I02e080f2a49624eef9a21c424053dc2a1d902452
2013-02-15 14:49:30 -08:00
Ronald S. Bultje
6cde1c58d7 Remove some unused structs and members from the decoder.
Change-Id: Ie309cb1f683a51c5dfac405fb32e8e2d6ee143ed
2013-02-15 14:06:30 -08:00
Ronald S. Bultje
3af36ea8cc Remove Y2 and Y-no-DC token types from the bitstream.
Change-Id: I7a5314daca993d46b8666ba1ec2ff3766c1e5042
2013-02-15 14:06:30 -08:00
Ronald S. Bultje
48598e30b1 Remove y2dc/ac Q delta values from the bitstream.
Since there is no Y2, these values are always zero. This changes the
bitstream results slightly, hence a separate commit.

Change-Id: I2f838f184341868f35113ec77ca89da53c4644e0
2013-02-15 14:06:30 -08:00
Ronald S. Bultje
46dff5d233 Remove some Y2-related code.
Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78
2013-02-15 14:06:25 -08:00
Scott LaVarnway
7755657ea7 Merge "WIP: ssse3 version of convolve avg functions" into experimental 2013-02-15 07:54:21 -08:00
John Koleszar
716db10f0d Merge "Moved vp9_get_coef_band to header file" into experimental 2013-02-14 18:02:55 -08:00
Scott LaVarnway
ae886d6bff Moved vp9_get_coef_band to header file
allowing the compiler to inline.

Change-Id: I66e5caf5e7fefa68a223ff0603aa3f9e11e35dbb
2013-02-14 12:27:25 -08:00
Yaowu Xu
03f28c0a12 Merge "Rewrote fdct16x16" into experimental 2013-02-14 09:06:37 -08:00
Paul Wilkins
45712dc8c8 Merge "Abstract selection of coef band." into experimental 2013-02-14 03:23:31 -08:00
Yunqing Wang
048b9d41a6 Rewrote fdct16x16
Used same algorithm as others.

Change-Id: Ifdac560762aec9735cb4bb6f1dbf549e415c38a0
2013-02-13 16:19:10 -08:00
Ronald S. Bultje
51afedbe28 Merge "Remove 2nd-order transform for first-order DC coefficients." into experimental 2013-02-13 13:58:02 -08:00
Ronald S. Bultje
89a206ef2f Add support for tile rows.
These allow sending partial bitstream packets over the network before
encoding a complete frame is completed, thus lowering end-to-end
latency. The tile-rows are not independent.

Change-Id: I99986595cbcbff9153e2a14f49b4aa7dee4768e2
2013-02-13 12:31:00 -08:00
Ronald S. Bultje
42d6be8080 Remove 2nd-order transform for first-order DC coefficients.
Since addition of the larger-scale transforms (16x16, 32x32), these
don't give a benefit at macroblock-sizes anymore. At superblock-sizes,
2nd-order transform was never used over the larger transforms. Future
work should test whether there is a benefit for that use case.

Change-Id: I90cadfc42befaf201de3eb0c4f7330c56e33330a
2013-02-13 12:28:19 -08:00
Paul Wilkins
9255ad107f Abstract selection of coef band.
This patch abstracts the selection of the coefficient band
context into a function as a precursor to further experiments
with the coefficient context.

It also removes the large per TX size coefficient band structures
and uses a single matrix for all block sizes within the test function.

This may have an impact on quality (results to follow) but is only an
intermediate step in the process of redefining the context. Also the
quality impact will be larger initially because the default tables will
be out of step with the new banding.

In particular the 4x4 will in this case only use 7 bands. If needed we
can add back block size dependency localized within the function, but
this can follow on after the other changes to the definition of the
context.

Change-Id: Id7009c2f4f9bb1d02b861af85fd8223d4285bde5
2013-02-13 19:01:25 +00:00
Paul Wilkins
56049d9488 Fixed encoder decoder mismatch.
Reverted part of change
I19981d1ef0b33e4e5732739574f367fe82771a84

That gives rise to an enc/dec mismatch.
As things stand the memsets are still needed.

Change-Id: I9fa076a703909aa0c4da0059ac6ae19aa530db30
2013-02-13 18:56:56 +00:00
Paul Wilkins
0d284ffed1 Abstract the selection of coefficient context.
This is an initial step to facilitate experimentation
with changes to the prior token context used to code
coefficients to take better account of the energy of
preceding tokens.

This patch merely abstracts the selection of context into
two functions and does not alter the output.

Change-Id: I117fff0b49c61da83aed641e36620442f86def86
2013-02-13 18:56:30 +00:00
Paul Wilkins
afa57bfc97 Merge "Remove NEWCOEFCONTEXT experiment." into experimental 2013-02-13 10:41:13 -08:00
Yaowu Xu
f01b08c96c Merge "enable bitstream lossless support" into experimental 2013-02-13 10:26:58 -08:00
Yaowu Xu
d3de97794f Merge "fix the lossless experiment" into experimental 2013-02-13 09:54:35 -08:00
Yaowu Xu
17db5d00be enable bitstream lossless support
1. Added a bit in frame header to  to indicate if a frame is encoded
in lossless mode, so decoder does not make the decision based on Q0
2. Minor changes to make sure that lossy coding works same as when
the lossless experiment is not enabled.
3. Renamed function pointers for transforms to be consistent, using
prefix fwd_txm and inv_txm for forward and inverse respectively

To encode in lossless mode, using "--lossless=1 --min-q=0 --max-q=0"
with vpxenc.

Change-Id: Ifae53b26d2ffbe378d707e29d96817b8a5e6c068
2013-02-13 09:24:39 -08:00
Yaowu Xu
16f25f9dc8 fix the lossless experiment
Change-Id: I95acfc1417634b52d344586ab97f0abaa9a4b256
2013-02-13 09:20:26 -08:00
Scott LaVarnway
30f866f44b WIP: ssse3 version of convolve avg functions
Initial ssse3 convolve avg functions and is one step closer
to using x86inc.asm.  The decoder performance improved by 8% for
the test clip used.  This should be revisited later to see if
averaging outside the loop is better than having many similar
filter functions.

Change-Id: Ice3fafb423b02710b0448ffca18b296bcac649e9
2013-02-13 09:15:38 -08:00
Paul Wilkins
6a9f0c61a4 Remove NEWCOEFCONTEXT experiment.
Removal of the  NEWCOEFCONTEXT experiment to
reduce code clutter and make it easier to experiment with
some other changes to the coefficient coding context.

Change-Id: Icd17b421384c354df6117cc714747647c5eb7e98
2013-02-13 15:12:17 +00:00
Paul Wilkins
649be94cf0 Removal of Hybrid DWT/DCT experiment.
Removal of experiment to simplify code base for other
changes.

Change-Id: If0a33952504558511926ad212bc311fc2bffb19a
2013-02-13 15:08:48 +00:00
Christian Duvivier
097f205289 Merge "Faster vp9_regular_quantize_b_8x8." into experimental 2013-02-12 17:08:00 -08:00
Christian Duvivier
0e4397f0cd Faster vp9_regular_quantize_b_8x8.
A couple of scalar optimizations speeding up quantization by about 1.6x. Overall encoder speedup is around 3%.

Change-Id: I19981d1ef0b33e4e5732739574f367fe82771a84
2013-02-12 15:55:58 -08:00
Yunqing Wang
7630cf0c3f Merge "Rewrote fdct8x8" into experimental 2013-02-12 15:52:31 -08:00
John Koleszar
1d60b6bcb5 Merge "Replace as_mv struct with array" into experimental 2013-02-12 13:59:04 -08:00
Ronald S. Bultje
f496f601fb Add tile column size limits (256 pixels min, 4096 pixels max).
This is after discussion with the hardware team. Update the unit test
to take these sizes into account. Split out some duplicate code into
a separate file so it can be shared.

Change-Id: I8311d11b0191d8bb37e8eb4ac962beb217e1bff5
2013-02-12 10:33:34 -08:00
Ronald S. Bultje
cb00be1fa2 Merge "Clean up detokenize contextualization to be like tokenizer." into experimental 2013-02-12 09:47:29 -08:00
Scott LaVarnway
ff024f812b Merge "Bug fix: ssse3 version of subpixel did not match C code" into experimental 2013-02-12 08:45:24 -08:00
Yunqing Wang
aa295918ed Rewrote fdct8x8
Use consistent algorithm.

Change-Id: Ib8484821ebc454b9d3380a3d6571798decd037f3
2013-02-11 22:28:05 -08:00
Ronald S. Bultje
491d095214 Clean up detokenize contextualization to be like tokenizer.
Change-Id: I47174f797df2103da8913c6fb4f4e741817bae82
2013-02-11 17:21:37 -08:00
Christian Duvivier
094e2572df Faster convolve8_avg.
Implement convolve8_avg using common functions which are already optimized
instead of using more obscure ones which have only C versions. Encoder
overall speed-up of about 12%.

Change-Id: I8c57aa76936c8a48f22b115f19f61d9f2ae1e4b6
2013-02-11 16:53:11 -08:00
Jingning Han
f1060e4cd8 Merge "butterfly inverse 4x4 ADST" into experimental 2013-02-11 14:46:06 -08:00
Yunqing Wang
ab2dc6ae57 Merge "Integerization of dct32x32" into experimental 2013-02-11 12:15:26 -08:00
Jingning Han
57e995ff9c butterfly inverse 4x4 ADST
fixed format issues.

Implement the inverse 4x4 ADST using 9 multiplications. For this
particular dimension, the original ADST transform can be
factorized into simpler operations, hence is retained.

Change-Id: Ie5d9749942468df299ab74e90d92cd899569e960
2013-02-11 10:42:39 -08:00
Ronald S. Bultje
5f2e8449b7 Merge "Port sadNxNx4d functions to x86inc.asm." into experimental 2013-02-11 08:20:12 -08:00
Paul Wilkins
aec5bed3db Change rd thresholds and add speed trade off flags.
Experimental tweaks to various thresholds to measure
quality / speed trade off.

Add flag that allows static segmentation to be turned off
and disables it unless in the second pass of a two pass
encode.

Change-Id: I219702ffe858412a83db801cbbbd869924b8c61b
2013-02-11 11:54:36 +00:00
Scott LaVarnway
eda30b410e Bug fix: ssse3 version of subpixel did not match C code
A 16 bit overflow condition occurs when using the EIGHTTAP_SMOOTH filters.
(vp9_sub_pel_filters_8lp)  Changed the order of the adds to fix this problem.
Also added ssse3 support for 4x4 subpixel filtering.

Change-Id: I475eaadae920794c2de5e01e9735c059a856518e
2013-02-09 15:15:14 -08:00
Paul Wilkins
e4f949b55a Merge "Nearest / Zero Mv default entropy tweak." into experimental 2013-02-09 04:21:08 -08:00
John Koleszar
7ca517f755 Replace as_mv struct with array
Replace as_mv.{first, second} with a two element array, so that they
can easily be processed with an index variable.

Change-Id: I1e429155544d2a94a5b72a5b467c53d8b8728190
2013-02-08 20:23:35 -08:00
John Koleszar
dc836109e4 Merge "Pass macroblock index to pick inter functions" into experimental 2013-02-08 20:20:37 -08:00
Ronald S. Bultje
c0ce2ab349 Port sadNxNx4d functions to x86inc.asm.
Change-Id: Ic639f5742f7a007753d7a3fa5c66235172eb31d8
2013-02-08 17:59:32 -08:00
Ronald S. Bultje
02ff360b33 Add sad64x64 and sad32x32 SSE2 versions.
Also port the 4x4, 16x16, 8x16 and 16x8 versions to x86inc.asm; this
makes them all slightly faster, particularly on x86-64. Remove SSE3
sad16x16 version, since the SSE2 version is now faster.

About 1.5% overall encoding speedup.

Change-Id: Id4011a78cce7839f554b301d0800d5ca021af797
2013-02-08 16:32:25 -08:00
Ronald S. Bultje
639b863d22 Make cost_coeffs() more efficient.
Cache the constant offset in one variable to prevent re-loading that
in each loop iteration, and mark the function as inline so we can use
the fact that the transform size is always known in the caller.

Almost 1% faster encoding overall.

Change-Id: Id78325a60b025057d8f4ecd9003a74086ccbf85a
2013-02-08 16:32:24 -08:00
John Koleszar
6125a1ed81 Pass macroblock index to pick inter functions
Pass the current mb row and column around rather than the
recon_yoffset and recon_uvoffset, since those offsets will
change from predictor to predictor, based on the reference
frame selection.

Change-Id: If3f9df059e00f5048ca729d3d083ff428e1859c1
2013-02-08 14:25:40 -08:00