4415 Commits

Author SHA1 Message Date
Dmitry Kovalev
3edbc77ae3 Merge "Consistent usage of ROUND_POWER_OF_TWO macro." into experimental 2013-03-08 11:35:22 -08:00
Yunqing Wang
2e0553227e Merge "Optimize add_constant_residual function" into experimental 2013-03-08 10:18:52 -08:00
Jingning Han
2a5278bdbd Extend diff MV limit from +/-256 to +/-1024
Increase the motion search range by 4x. Change MV_CLASS tree of the
entropy coding to allow two additional mv classes to cover the
extended motion vector limit. The codec determines the effective
motion search range conditioned on the actual frame dimension.

It provides coding gains:

stdhd 0.39%
yt    0.56%
hd    0.47%

Major coding performance gains are packed in several sequences with
intense motion activities, e.g., ped_1080p gains 7% at high bit-rates,
and on average 3%.

TODO: Need to further tune the rate control and motion search units.

Change-Id: Ib842540a6796fbee5a797809433ef6a477c6d78d
2013-03-08 10:04:36 -08:00
Ronald S. Bultje
b41dee8428 Add support for tx_select in i8x8 encoding in keyframes.
Also enable tx_select for keyframes.

Change-Id: Iadb1231d9fa7af0c8dce3d9b41830b93a302479e
2013-03-08 09:28:46 -08:00
Yunqing Wang
f240782650 Optimize add_constant_residual function
Optimized adding constant diff to predictor, which gave about
2% decoder performance gain.

Change-Id: I47db20c31428e8c4a8f16214a85cbe386a6e9303
2013-03-07 15:49:07 -08:00
Yunqing Wang
6fdd4d26de Merge "Allocate 16-byte aligned diff buffer" into experimental 2013-03-07 15:40:38 -08:00
Yunqing Wang
b339aea675 Allocate 16-byte aligned diff buffer
This was done based on John's suggestion.

Change-Id: I62516a513c31fe3dbea0d6cd063df79d9e819ec8
2013-03-07 15:29:27 -08:00
Dmitry Kovalev
3603dfb62c Consistent usage of ROUND_POWER_OF_TWO macro.
Change-Id: I44660975e9985310d8c654c158ee7a61291b5a08
2013-03-07 12:24:35 -08:00
Ronald S. Bultje
89e4ce20d0 Update ADST selection if tx_size < block_size.
Change-Id: Ic9b336486774c95ffbb92adcb110cc0fc2a83cc5
2013-03-07 11:19:15 -08:00
Ronald S. Bultje
d3724abe9f Re-add support for ADST in superblocks.
This also changes the RD search to take account of the correct block
index when searching (this is required for ADST positioning to work
correctly in combination with tx_select).

Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
2013-03-07 11:19:10 -08:00
Yunqing Wang
3162371544 Fix issue in add_residual intrinsic function
Yaowu found this function had a compiling issue with MSVC because
of using _mm_storel_pi((__m64 *)(dest + 0 * stride), (__m128)p0).
To be safe, changed back to use integer store instruction.

Also, for some build, diff could not always be 16-byte aligned.
Changed that in the code.

Change-Id: I9995e5446af15dad18f3c5c0bad1ae68abef6c0d
2013-03-07 09:22:27 -08:00
Deb Mukherjee
eb6ef2417f Coding con-zero count rather than EOB for coeffs
This patch revamps the entropy coding of coefficients to code first
a non-zero count per coded block and correspondingly remove the EOB
token from the token set.

STATUS:
Main encode/decode code achieving encode/decode sync - done.
Forward and backward probability updates to the nzcs - done.
Rd costing updates for nzcs - done.
Note: The dynamic progrmaming apporach used in trellis quantization
is not exactly compatible with nzcs. A suboptimal approach has been
used instead where branch costs are updated to account for changes
in the nzcs.

TODO:
Training the default probs/counts for nzcs

Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
2013-03-07 07:20:30 -08:00
Dmitry Kovalev
a9961fa819 Merge "Code cleanup." into experimental 2013-03-06 16:57:34 -08:00
Paul Wilkins
72a6201050 Merge "Added stricter Q control flag." into experimental 2013-03-06 04:32:22 -08:00
Paul Wilkins
db6ad0138c Added stricter Q control flag.
Added a variant of the one shot maxQ flag
for two pass that forces a fixed Q for the
normal inter frames. Disabled by default.
Also small adjustment to the Bits per MB
estimation.
Change-Id: I87efdfb2d094fe1340ca9ddae37470d7b278c8b8
2013-03-06 12:05:49 +00:00
Yunqing Wang
f4e383f3d1 Merge "Optimize add_residual function" into experimental 2013-03-05 16:47:58 -08:00
Yunqing Wang
943c6d7172 Optimize add_residual function
Optimized adding diff to predictor, which gave 0.8% decoder
performance gain.

Change-Id: Ic920f0baa8cbd13a73fa77b7f9da83b58749f0f8
2013-03-05 16:27:45 -08:00
Dmitry Kovalev
7f99c3c59a Code cleanup.
Removing redundant 'extern' keywords, fixing formatting and #include order,
code simplification.

Change-Id: I0e5fdc8009010f3f885f13b5d76859b9da511758
2013-03-05 14:12:16 -08:00
John Koleszar
522d4bf852 Add 'superframe' index
A 'superframe' is a group of frames that share the same PTS, but have a
defined decoding order. This commit adds the ability to append an index
to such a group of frames, allowing for random access to the constituent
frames. This could be useful for frame-level parallelism or partial
decoding in a multilayer scenario.

Decoding the stream serially without such an index should work as a
fallback, and VP9/TestSuperframeIndexIsOptional verifies that.

Change-Id: Idff83b7560e1a7077d8fb067bfbc45b567e78b1c
2013-03-05 12:45:40 -08:00
Ronald S. Bultje
4209bba462 Merge changes Ifacbf5a0,Ibad7c3dd into experimental
* changes:
  vpxenc: actually report mismatch on stderr.
  Make superblocks independent of macroblock code and data.
2013-03-05 11:17:14 -08:00
Dmitry Kovalev
764be4f66f Merge "Code cleanup and simplification of build_4x4uvmvs function." into experimental 2013-03-04 16:57:30 -08:00
Ronald S. Bultje
111ca42133 Make superblocks independent of macroblock code and data.
Split macroblock and superblock tokenization and detokenization
functions and coefficient-related data structs so that the bitstream
layout and related code of superblock coefficients looks less like it's
a hack to fit macroblocks in superblocks.

In addition, unify chroma transform size selection from luma transform
size (i.e. always use the same size, as long as it fits the predictor);
in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
transform will now use the 16x16 (instead of the 8x8) chroma transform,
and 64x64 superblocks using the 32x32 luma transform will now use the
32x32 (instead of the 16x16) chroma transform.

Lastly, add a trellis optimize function for 32x32 transform blocks.

HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
a few negative points here and there that I might want to analyze
a little closer.

Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
2013-03-04 16:34:36 -08:00
John Koleszar
daa9b29ea1 Reinitialize motion search tables on frame size change
Make sure the motion search is done with the offsets calculated from
the correct stride.

Change-Id: Ifbcc0f742eda3399c255bfcfa1cdee9a4bb4b4e7
2013-03-04 16:00:01 -08:00
Dmitry Kovalev
49b697d327 Merge "Code cleanup." into experimental 2013-03-04 15:41:15 -08:00
Yunqing Wang
37932d9168 Merge "Optimize vp9_short_idct4x4llm function" into experimental 2013-03-04 14:13:31 -08:00
Yunqing Wang
e8bc9f4220 Optimize vp9_short_idct4x4llm function
Wrote a SSE2 vp9_short_idct4x4llm to improve the decoder
performance.

Change-Id: I90b9d48c4bf37aaf47995bffe7e584e6d4a2c000
2013-03-04 12:01:27 -08:00
Jingning Han
5957b2b514 Support 16K sequence coding
Fixed a couple of variable/function definitions, as well as header
handling to support 16K sequence coding at high bit-rates.

The width and height are each specified by two bytes in the header.
Use an extra byte to explicitly indicate the scaling factors in
both directions, each ranging from 0 to 15.

Tested coding up to 16400x16400 dimension.

Change-Id: Ibc2225c6036620270f2c0cf5172d1760aaec10ec
2013-03-04 11:08:41 -08:00
John Koleszar
1cfc86ebe0 Add unit test for x4 multi-SAD functions
Update the function prototypes to match between VP9 and VP8.

Change-Id: If58965073989e87df3b62b67a030ec6ce23ca04f
2013-03-01 18:14:02 -08:00
Dmitry Kovalev
b5a9795d25 Code cleanup and simplification of build_4x4uvmvs function.
Change-Id: Iab0176f058045181821ded95ff1cf423af1625f9
2013-03-01 17:50:55 -08:00
Dmitry Kovalev
135428e954 Code cleanup.
Removing redundant 'extern' keyword, lowercase variable names.

Change-Id: I608e8d8579aba8981f5fac3493f77b4481b13808
2013-03-01 17:39:31 -08:00
John Koleszar
69c67c9531 Merge master branch into experimental
Picks up some build system changes, compiler warning fixes, etc.

Change-Id: I2712f99e653502818a101a72696ad54018152d4e
2013-03-01 11:06:05 -08:00
Yaowu Xu
db4dc6f0c0 Merge "Adjust the max_gf_interval initialization" into experimental 2013-03-01 11:02:23 -08:00
Yunqing Wang
67dbc8fe55 Merge "Add eob<=10 case in idct32x32" into experimental 2013-03-01 08:58:19 -08:00
Yaowu Xu
cea8cd08d3 Adjust the max_gf_interval initialization
to be a fixed value of 15.

Test results:
cif:  .124%, .068%, .081%
std-hd: 2.809%, 3.174%, 2.705%

Change-Id: I380c8152c973506094da15eab59e3aa22b75a983
2013-03-01 06:38:35 -08:00
Dmitry Kovalev
852ca19e4b Merge "Code cleanup." into experimental 2013-02-28 17:22:51 -08:00
Yunqing Wang
c550bb3b09 Add eob<=10 case in idct32x32
Simplified idct32x32 calculation when there are only 10 or less
non-zero coefficients in 32x32 block. This helps the decoder
performance.

Change-Id: If7f8893d27b64a9892b4b2621a37fdf4ac0c2a6d
2013-02-28 16:40:29 -08:00
Dmitry Kovalev
253886413a Merge changes I9be9c990,Ic3b97339 into experimental
* changes:
  Ignoring test video sequences in the source tree.
  Code cleanup.
2013-02-28 16:07:45 -08:00
James Zern
a07bed2b2b firstpass.c: correct casting around gf_group_bits
gf_group_bits is int64_t remove casts to int.

Change-Id: I3b4225905041fac9af9fdfcbcb6f1c357ea4b593
2013-02-28 15:45:29 -08:00
John Koleszar
17c221687f Merge "Fix use of uninitialized memory in CONFIG_ABOVESPREFMV" into experimental 2013-02-28 15:18:50 -08:00
Jim Bankoski
078f5bf439 Merge "mv dct_sse2.c dct_sse2_intrinsics.c to avoid collision" into experimental 2013-02-28 15:16:44 -08:00
Dmitry Kovalev
dcbdda8e15 Code cleanup.
Lower case variable names, converting while loops to for loops.

Change-Id: Ic3b973391eef7472a99d18d02fe79cfef5e04e62
2013-02-28 14:40:20 -08:00
Yunqing Wang
72b146690a Merge "Refactor vp9_dequant_idct_add function" into experimental 2013-02-28 14:34:27 -08:00
Yunqing Wang
6193bc3ba8 Refactor vp9_dequant_idct_add function
Provided a wrapper and removed duplicate code.

Change-Id: Iaef842226ec348422e459202793b001d0983ea30
2013-02-28 14:18:46 -08:00
Scott LaVarnway
aa8fb070b8 Removed vp9_dequantize_b
Change-Id: Ie89bd00d58e30bf4094cb748a282f1dfa81a31d8
2013-02-28 14:08:12 -08:00
Jim Bankoski
8f270acfb2 mv dct_sse2.c dct_sse2_intrinsics.c to avoid collision
Change-Id: Id786be31da3c91d95d2955aa569ecdc6e66650df
2013-02-28 13:58:15 -08:00
John Koleszar
2eab4372fc Fix use of uninitialized memory in CONFIG_ABOVESPREFMV
The ABOVESPREFMV experiment uses four pixels to the left of the
current block, which don't exist for the left-most column.

Change-Id: I4cf0b42ae8f54c0b3e7b1ed8755704b74fafc39c
2013-02-28 13:48:58 -08:00
Dmitry Kovalev
40fec9b588 Merge "Dequantization code cleanup." into experimental 2013-02-28 13:46:43 -08:00
Dmitry Kovalev
c43906e2e9 Dequantization code cleanup.
Removing redundant variables, using x *= y instead x = x * y, moving
variable declarations into inner blocks.

Change-Id: I884f95c755f55d51b7c1c6585f10296919063e41
2013-02-28 13:28:05 -08:00
Dmitry Kovalev
0d9cc0a9f0 Code cleanup.
Removing redundant 'extern' keyword, better formatting, code
simplification.

Change-Id: I132fea14f08c706ee9ea147d19464d03f833f25b
2013-02-28 13:18:02 -08:00
John Koleszar
b6a3062d81 Fix incorrect comparison of frame size
The width and height stored in the reference frames are padded out to
a multiple of 16. The Width and Height variables in common are the
displayed size, which may be smaller. The incorrect comparison was
causing scaling related code to be called when it shouldn't have
been. A notable case where this happens is 1080p, since 1088 != 1080.

Change-Id: I55f743eeeeaefbf2e777e193bc9a77ff726e16b5
2013-02-28 11:33:02 -08:00