Commit Graph

436 Commits

Author SHA1 Message Date
James Zern
fba9772dd2 vp8_first_pass(): avoid floating point div by 0
Change-Id: Id1e6a12db6b0c1d3f64ead8fd8834aadc30fbed2
2013-02-22 12:41:59 -08:00
Jingning Han
936aa281b5 Fixed the buffer overflow issue
The issue that potentially broke the encoding process was due to the fact
that the length of token link is calculated from the total number of tokens
coded, while it is possible, in high bit-rate setting, this length is
greater than the buffer length initially assigned to the cpi->tok.

This patch increases the initially allocated buffer length assigned to
cpi->tok from
(mb_rows * mb_cols * 24 * 16) to (mb_rows * mb_cols * (1 + 24 * 16)).

It resolves the buffer overflow problem.

Change-Id: I8661a8d39ea0a3c24303e3f71a170787a1d5b1df
2013-02-22 12:30:35 -08:00
John Koleszar
606a2561d6 Merge "Code cleanup." into experimental 2013-02-22 11:20:20 -08:00
Dmitry Kovalev
548b4dd5f2 Code cleanup.
Removing redundant 'extern' keywords and parentheses, fixing indentation,
making variable names lower case, using short expressions x *= c
instead of x = x * c, minor code simplifications.

Change-Id: If6a25fcf306d1db26e90d27e3c24a32735c607de
2013-02-22 11:03:14 -08:00
Jingning Han
c67a20994f Merge "Forward butterfly hybrid transform" into experimental 2013-02-22 09:20:26 -08:00
Paul Wilkins
b5f3cb6e37 Merge "Experimental removal of over quant code" into experimental 2013-02-22 08:44:40 -08:00
Paul Wilkins
dbf4942046 Experimental removal of over quant code
The over quant code was added in VP8 post
bitstream freeze to allow compression to lower
data rates

In VP9 the real qualtizer range has been greatly
extended anyway.

Change-Id: I5d384fa5e9a83ef75a3df34ee30627bd21901526
2013-02-22 14:00:51 +00:00
Jingning Han
babbd5d170 Forward butterfly hybrid transform
This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT
hybrid transform. The kernel of 4x4 ADST is sin((2k+1)*(n+1)/(2N+1)).
The kernel of 8x8/16x16 ADST is of the form sin((2k+1)*(2n+1)/4N).

Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1
2013-02-21 18:24:28 -08:00
Dmitry Kovalev
5a18106fb7 Code cleanup.
Removing redundant 'extern' keywords. Moving VP9DX_BOOL_DECODER from .h
to .c file.

Change-Id: I5a3056cb3d33db7ed3c3f4629675aa8e21014e66
2013-02-21 13:50:15 -08:00
John Koleszar
4674312382 Merge "Code cleanup." into experimental 2013-02-21 10:56:17 -08:00
Dmitry Kovalev
5da8534963 Code cleanup.
Removing redundant 'extern' keyword from function declarations and making
function arguments lower case.

Change-Id: Idae9a2183b067f2b6c85ad84738d275e8bbff9d9
2013-02-21 10:34:33 -08:00
Deb Mukherjee
048f593703 Merge "Refactoring of switchable filter search for speed" into experimental 2013-02-21 09:23:50 -08:00
Deb Mukherjee
28b1db9278 Refactoring of switchable filter search for speed
Refactors the switchable filter search in the rd loop to
improve encode speed.

Uses a piecewise approximation to a closed form expression to estimate
rd cost for a Laplacian source with a given variance and quantization
step-size.

About 40% encode time reduction is achieved.

Results (on a feb 12 baseline) show a slight drop:

derf: -0.019%
yt: +0.010%
std-hd: -0.162%
hd: -0.050%

Change-Id: Ie861badf5bba1e3b1052e29a0ef1b7e256edbcd0
2013-02-20 18:34:42 -08:00
Jingning Han
abfd2a4880 Merge "Fixed the buffer overflow issue" into experimental 2013-02-20 16:27:27 -08:00
Jingning Han
232ccc2fbe Fixed the buffer overflow issue
The issue that potentially broke the encoding process was due to the fact
that the length of token link is calculated from the total number of tokens
coded, while it is possible, in high bit-rate setting, this length is
greater than the buffer length initially assigned to the cpi->tok.

This patch increases the initially allocated buffer length assigned to
cpi->tok from
(mb_rows * mb_cols * 24 * 16) to (mb_rows * mb_cols * (1 + 24 * 16)).

It resolves the buffer overflow problem.

Change-Id: I8661a8d39ea0a3c24303e3f71a170787a1d5b1df
2013-02-20 15:41:48 -08:00
Yaowu Xu
441f24de3d Merge "Merge lossless experiment" into experimental 2013-02-20 12:27:26 -08:00
Yaowu Xu
d262e26cc7 Merge lossless experiment
Change-Id: I7b7b8d4fda3a23699e0c920d727f8c15d37d43aa
2013-02-20 07:54:28 -08:00
Paul Wilkins
ef01b956d8 Entropy stats output code.
Fixes to make Entropy stats code work again

Change-Id: I62e380481a4eb4c170076ac6ab36f0c2b203e914
2013-02-20 14:33:19 +00:00
Ronald S. Bultje
ae81d3a03f Merge "Minor cosmetic cleanups." into experimental 2013-02-19 08:54:44 -08:00
Ronald S. Bultje
0694ea0ed6 Merge "Prevent filling transform size cache with uninitialized values." into experimental 2013-02-19 08:54:35 -08:00
Yaowu Xu
93d6b86cfd Use lossless for Q0
The commit changes the coding mode to lossless whenever the lowest
quantizer is choosen.

As expected, test results showed no difference for cif and std-hd
set where Q0 is rarely used. For yt and yt-hd set, Q0 is used for
a number of clips, where this commit helped a lot in the high end.

Average over all clips in the sets:
yt: 2.391% 1.017% 1.066%
hd: 1.937%  .764%  .787%

Change-Id: I9fa9df8646fd70cb09ffe9e4202b86b67da16765
2013-02-19 06:18:42 -08:00
Ronald S. Bultje
aa84c16da2 Minor cosmetic cleanups.
Change-Id: I13d8ae754827368755575dd699a087b3b11f5b16
2013-02-15 17:21:16 -08:00
Ronald S. Bultje
ebfdaa0e0b Prevent filling transform size cache with uninitialized values.
The 32x32 value in case of splitmv was uninitialized. this leads to
all kind of erratic behaviour down the line. Also fill in dummy values
for superblocks in keyframes (the values are currently unused, but we
run into integer overflows anyway, which makes detecting bad cases
harder). Lastly, in case we did not find any RD value at all, don't
set tx_diff to INT_MIN, but instead set it to zero (since if we couldn't
find a mode, it's unlikely that any particular transform would have made
that worse or better; rather, it's likely equally bad for all tx_sizes).

Change-Id: If236fd3aa2037e5b398d03f3b1978fbbc5ce740e
2013-02-15 17:21:16 -08:00
Ronald S. Bultje
5bb103c486 Merge "Remove Y2 and Y-no-DC token types from the bitstream." into experimental 2013-02-15 17:11:20 -08:00
Jingning Han
e343732a92 Fixed a subtle issue that breaks encoding process
This issue breaks the encoding process of the codebase. The effect
emerges only in particular test sequence at certain bit-rates and
frame limits.

Change-Id: I02e080f2a49624eef9a21c424053dc2a1d902452
2013-02-15 14:49:30 -08:00
Ronald S. Bultje
3af36ea8cc Remove Y2 and Y-no-DC token types from the bitstream.
Change-Id: I7a5314daca993d46b8666ba1ec2ff3766c1e5042
2013-02-15 14:06:30 -08:00
Ronald S. Bultje
48598e30b1 Remove y2dc/ac Q delta values from the bitstream.
Since there is no Y2, these values are always zero. This changes the
bitstream results slightly, hence a separate commit.

Change-Id: I2f838f184341868f35113ec77ca89da53c4644e0
2013-02-15 14:06:30 -08:00
Ronald S. Bultje
46dff5d233 Remove some Y2-related code.
Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78
2013-02-15 14:06:25 -08:00
John Koleszar
716db10f0d Merge "Moved vp9_get_coef_band to header file" into experimental 2013-02-14 18:02:55 -08:00
Scott LaVarnway
ae886d6bff Moved vp9_get_coef_band to header file
allowing the compiler to inline.

Change-Id: I66e5caf5e7fefa68a223ff0603aa3f9e11e35dbb
2013-02-14 12:27:25 -08:00
Yaowu Xu
03f28c0a12 Merge "Rewrote fdct16x16" into experimental 2013-02-14 09:06:37 -08:00
Paul Wilkins
45712dc8c8 Merge "Abstract selection of coef band." into experimental 2013-02-14 03:23:31 -08:00
Yunqing Wang
048b9d41a6 Rewrote fdct16x16
Used same algorithm as others.

Change-Id: Ifdac560762aec9735cb4bb6f1dbf549e415c38a0
2013-02-13 16:19:10 -08:00
Ronald S. Bultje
89a206ef2f Add support for tile rows.
These allow sending partial bitstream packets over the network before
encoding a complete frame is completed, thus lowering end-to-end
latency. The tile-rows are not independent.

Change-Id: I99986595cbcbff9153e2a14f49b4aa7dee4768e2
2013-02-13 12:31:00 -08:00
Paul Wilkins
9255ad107f Abstract selection of coef band.
This patch abstracts the selection of the coefficient band
context into a function as a precursor to further experiments
with the coefficient context.

It also removes the large per TX size coefficient band structures
and uses a single matrix for all block sizes within the test function.

This may have an impact on quality (results to follow) but is only an
intermediate step in the process of redefining the context. Also the
quality impact will be larger initially because the default tables will
be out of step with the new banding.

In particular the 4x4 will in this case only use 7 bands. If needed we
can add back block size dependency localized within the function, but
this can follow on after the other changes to the definition of the
context.

Change-Id: Id7009c2f4f9bb1d02b861af85fd8223d4285bde5
2013-02-13 19:01:25 +00:00
Paul Wilkins
56049d9488 Fixed encoder decoder mismatch.
Reverted part of change
I19981d1ef0b33e4e5732739574f367fe82771a84

That gives rise to an enc/dec mismatch.
As things stand the memsets are still needed.

Change-Id: I9fa076a703909aa0c4da0059ac6ae19aa530db30
2013-02-13 18:56:56 +00:00
Paul Wilkins
0d284ffed1 Abstract the selection of coefficient context.
This is an initial step to facilitate experimentation
with changes to the prior token context used to code
coefficients to take better account of the energy of
preceding tokens.

This patch merely abstracts the selection of context into
two functions and does not alter the output.

Change-Id: I117fff0b49c61da83aed641e36620442f86def86
2013-02-13 18:56:30 +00:00
Paul Wilkins
afa57bfc97 Merge "Remove NEWCOEFCONTEXT experiment." into experimental 2013-02-13 10:41:13 -08:00
Yaowu Xu
f01b08c96c Merge "enable bitstream lossless support" into experimental 2013-02-13 10:26:58 -08:00
Yaowu Xu
d3de97794f Merge "fix the lossless experiment" into experimental 2013-02-13 09:54:35 -08:00
Yaowu Xu
17db5d00be enable bitstream lossless support
1. Added a bit in frame header to  to indicate if a frame is encoded
in lossless mode, so decoder does not make the decision based on Q0
2. Minor changes to make sure that lossy coding works same as when
the lossless experiment is not enabled.
3. Renamed function pointers for transforms to be consistent, using
prefix fwd_txm and inv_txm for forward and inverse respectively

To encode in lossless mode, using "--lossless=1 --min-q=0 --max-q=0"
with vpxenc.

Change-Id: Ifae53b26d2ffbe378d707e29d96817b8a5e6c068
2013-02-13 09:24:39 -08:00
Yaowu Xu
16f25f9dc8 fix the lossless experiment
Change-Id: I95acfc1417634b52d344586ab97f0abaa9a4b256
2013-02-13 09:20:26 -08:00
Paul Wilkins
6a9f0c61a4 Remove NEWCOEFCONTEXT experiment.
Removal of the  NEWCOEFCONTEXT experiment to
reduce code clutter and make it easier to experiment with
some other changes to the coefficient coding context.

Change-Id: Icd17b421384c354df6117cc714747647c5eb7e98
2013-02-13 15:12:17 +00:00
Paul Wilkins
649be94cf0 Removal of Hybrid DWT/DCT experiment.
Removal of experiment to simplify code base for other
changes.

Change-Id: If0a33952504558511926ad212bc311fc2bffb19a
2013-02-13 15:08:48 +00:00
Christian Duvivier
097f205289 Merge "Faster vp9_regular_quantize_b_8x8." into experimental 2013-02-12 17:08:00 -08:00
Christian Duvivier
0e4397f0cd Faster vp9_regular_quantize_b_8x8.
A couple of scalar optimizations speeding up quantization by about 1.6x. Overall encoder speedup is around 3%.

Change-Id: I19981d1ef0b33e4e5732739574f367fe82771a84
2013-02-12 15:55:58 -08:00
Yunqing Wang
7630cf0c3f Merge "Rewrote fdct8x8" into experimental 2013-02-12 15:52:31 -08:00
John Koleszar
1d60b6bcb5 Merge "Replace as_mv struct with array" into experimental 2013-02-12 13:59:04 -08:00
Ronald S. Bultje
f496f601fb Add tile column size limits (256 pixels min, 4096 pixels max).
This is after discussion with the hardware team. Update the unit test
to take these sizes into account. Split out some duplicate code into
a separate file so it can be shared.

Change-Id: I8311d11b0191d8bb37e8eb4ac962beb217e1bff5
2013-02-12 10:33:34 -08:00
Yunqing Wang
aa295918ed Rewrote fdct8x8
Use consistent algorithm.

Change-Id: Ib8484821ebc454b9d3380a3d6571798decd037f3
2013-02-11 22:28:05 -08:00
Jingning Han
f1060e4cd8 Merge "butterfly inverse 4x4 ADST" into experimental 2013-02-11 14:46:06 -08:00
Yunqing Wang
ab2dc6ae57 Merge "Integerization of dct32x32" into experimental 2013-02-11 12:15:26 -08:00
Jingning Han
57e995ff9c butterfly inverse 4x4 ADST
fixed format issues.

Implement the inverse 4x4 ADST using 9 multiplications. For this
particular dimension, the original ADST transform can be
factorized into simpler operations, hence is retained.

Change-Id: Ie5d9749942468df299ab74e90d92cd899569e960
2013-02-11 10:42:39 -08:00
Ronald S. Bultje
5f2e8449b7 Merge "Port sadNxNx4d functions to x86inc.asm." into experimental 2013-02-11 08:20:12 -08:00
Paul Wilkins
aec5bed3db Change rd thresholds and add speed trade off flags.
Experimental tweaks to various thresholds to measure
quality / speed trade off.

Add flag that allows static segmentation to be turned off
and disables it unless in the second pass of a two pass
encode.

Change-Id: I219702ffe858412a83db801cbbbd869924b8c61b
2013-02-11 11:54:36 +00:00
Paul Wilkins
e4f949b55a Merge "Nearest / Zero Mv default entropy tweak." into experimental 2013-02-09 04:21:08 -08:00
John Koleszar
7ca517f755 Replace as_mv struct with array
Replace as_mv.{first, second} with a two element array, so that they
can easily be processed with an index variable.

Change-Id: I1e429155544d2a94a5b72a5b467c53d8b8728190
2013-02-08 20:23:35 -08:00
John Koleszar
dc836109e4 Merge "Pass macroblock index to pick inter functions" into experimental 2013-02-08 20:20:37 -08:00
Ronald S. Bultje
c0ce2ab349 Port sadNxNx4d functions to x86inc.asm.
Change-Id: Ic639f5742f7a007753d7a3fa5c66235172eb31d8
2013-02-08 17:59:32 -08:00
Ronald S. Bultje
02ff360b33 Add sad64x64 and sad32x32 SSE2 versions.
Also port the 4x4, 16x16, 8x16 and 16x8 versions to x86inc.asm; this
makes them all slightly faster, particularly on x86-64. Remove SSE3
sad16x16 version, since the SSE2 version is now faster.

About 1.5% overall encoding speedup.

Change-Id: Id4011a78cce7839f554b301d0800d5ca021af797
2013-02-08 16:32:25 -08:00
Ronald S. Bultje
639b863d22 Make cost_coeffs() more efficient.
Cache the constant offset in one variable to prevent re-loading that
in each loop iteration, and mark the function as inline so we can use
the fact that the transform size is always known in the caller.

Almost 1% faster encoding overall.

Change-Id: Id78325a60b025057d8f4ecd9003a74086ccbf85a
2013-02-08 16:32:24 -08:00
John Koleszar
6125a1ed81 Pass macroblock index to pick inter functions
Pass the current mb row and column around rather than the
recon_yoffset and recon_uvoffset, since those offsets will
change from predictor to predictor, based on the reference
frame selection.

Change-Id: If3f9df059e00f5048ca729d3d083ff428e1859c1
2013-02-08 14:25:40 -08:00
John Koleszar
6dfc95fe63 Merge changes Icd1a2a5a,I204d17a1,I3ed92117 into experimental
* changes:
  Initial support for resolution changes on P-frames
  Avoid allocating memory when resizing frames
  Adds a test for the VP8E_SET_SCALEMODE control
2013-02-08 14:20:05 -08:00
John Koleszar
3de8ee6ba1 Merge changes Ife0d8147,I7d469716,Ic9a5615f into experimental
* changes:
  Restore SSSE3 subpixel filters in new convolve framework
  Convert subpixel filters to use convolve framework
  Add 8-tap generic convolver
2013-02-08 13:19:47 -08:00
John Koleszar
393b485627 Initial support for resolution changes on P-frames
Allows inter-frames to change resolution. Currently these are
almost equivalent to keyframes, as only intra prediction modes
are allowed, but without the other context resets that occur on
keyframes.

Change-Id: Icd1a2a5af0d9462cc792588427b0a1f5b12e40d3
2013-02-08 12:20:30 -08:00
John Koleszar
c03d45def9 Avoid allocating memory when resizing frames
As long as the new frame is smaller than the size that was originally
allocated, we don't need to free and reallocate the memory allocated.
Instead, do the allocation on the size of the first frame. We could
make this passed in from the application instead, if we wanted to
support external upscaling.

Change-Id: I204d17a130728bbd91155bb4bd863a99bb99b038
2013-02-08 12:20:30 -08:00
John Koleszar
88f99f4ec2 Adds a test for the VP8E_SET_SCALEMODE control
Tests that the external interface to set the internal codec scaling
works as expected. Also updates the test to pull the height from
the decoded frame size rather than parsing the keyframe header,
in anticipation of allowing resolution changes on non-keyframes.

Change-Id: I3ed92117d8e5288fbbd1e7b618f2f233d0fe2c17
2013-02-08 12:20:30 -08:00
Yunqing Wang
dbccffe299 Integerization of dct32x32
Test on derf set showed 0.047% overall psnr change.

Change-Id: Id16c276c251a3943850ac9b95e9b09a56cf42b19
2013-02-08 08:50:47 -08:00
Paul Wilkins
bbede82f24 Nearest / Zero Mv default entropy tweak.
Tweak to default mode context to account for the fact
that when there are no non zero motion candidates
Nearest is now the preferred mode for coding a 0,0
vector.

Also resolve duplicate function name and typos.

Change-Id: I76802788d46c84e3d1c771be216a537ab7b12817
2013-02-08 10:16:13 +00:00
Yaowu Xu
e6ad9ab02c move dct/idct constants to a header file
also removed some un-unsed functions.

Change-Id: Ie363bcc8d94441d054137d2ef7c4fe59f56027e5
2013-02-07 13:51:45 -08:00
Jingning Han
d15e1da494 Butterfly ADST based hybrid transform
Refactor the 8x8 inverse hybrid transform. It is now consistent
with the new inverse DCT. Overall performance loss (due to the
use of this variant ADST, and the rounding errors in the butterfly
implementation) for std-hd is -0.02.

Fixed BUILD warning.

Devise a variant of the original ADST, which allows butterfly
computation structure. This new transform has kernel of the
form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures
using floating-point multiplications was reported in Z. Wang,
"Fast algorithms for the discrete W transform and for the discrete
Fourier transform", IEEE Trans. on ASSP, 1984.

This patch includes the butterfly implementation of the inverse
ADST/DCT hybrid transform of dimension 8x8.

Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d
2013-02-07 10:07:46 -08:00
Paul Wilkins
29731308c4 Added skip switches for SB32 and SB64
Added switches and code to skip/breakout from
doing SB32 and SB64 tests based on whether
the 16x16 MB tests used split modes. Also to
optionally skip 64x64 if 16x16 was chosen over
32x32.

Impact varies depending on clip from a few %
up to almost 50% on encode speed. Only the
split mode breakout is currently enabled.

Change-Id: Ib5836140b064b350ffa3057778ed2cadcc495cf8
2013-02-07 10:45:41 +00:00
Ronald S. Bultje
5cfd82bcaf Use fdct8x4 instead of fdct4x4 where the block size allows it.
This allows for faster SIMD implementations in the future (currently
there is no speed impact).

Change-Id: I732647e9148b5dcb44e6bc8728138f0141218329
2013-02-06 16:13:02 -08:00
Ronald S. Bultje
aac73df1a7 Use configure checks for various inline keywords.
Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24
2013-02-06 16:12:56 -08:00
Ronald S. Bultje
a788e0fe63 Add sse2 versions of sub_pixel_variance{32x32,64x64}.
7.5% faster overall encoding.

Change-Id: Ie9bb7f9fdf93659eda106404cb342525df1ba02f
2013-02-06 11:20:59 -08:00
Ronald S. Bultje
a001fe9708 Merge "Reindent segmentation code." into experimental 2013-02-06 10:07:30 -08:00
Ronald S. Bultje
55cafb6156 Reindent segmentation code.
Indentation was off by 2 spaces for this particular block.

Change-Id: I1e587b7ad3eff77ade5521252d20c7bb2daa0f6d
2013-02-06 09:18:25 -08:00
John Koleszar
31cbe2ed9a Eliminate tautology
Unreachable code
  that does nothing anyway
      removed forever.

Change-Id: I14105d2dd9dbc9d558f36464055e350dbeb45488
2013-02-06 08:22:59 -08:00
Paul Wilkins
8b4e9c5925 Merge "Change definition of NearestMV." into experimental 2013-02-06 04:06:31 -08:00
Ronald S. Bultje
278df745d2 Fix mismatch after merge of the tiling patch.
Change-Id: I8ecc178b4d4069e721c7fec6d7631c00e4a3e5d5
2013-02-05 17:15:04 -08:00
Ronald S. Bultje
1407bdc243 [WIP] Add column-based tiling.
This patch adds column-based tiling. The idea is to make each tile
independently decodable (after reading the common frame header) and
also independendly encodable (minus within-frame cost adjustments in
the RD loop) to speed-up hardware & software en/decoders if they used
multi-threading. Column-based tiling has the added advantage (over
other tiling methods) that it minimizes realtime use-case latency,
since all threads can start encoding data as soon as the first SB-row
worth of data is available to the encoder.

There is some test code that does random tile ordering in the decoder,
to confirm that each tile is indeed independently decodable from other
tiles in the same frame. At tile edges, all contexts assume default
values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode),
and motion vector search and ordering do not cross tiles in the same
frame.
t log

Tile independence is not maintained between frames ATM, i.e. tile 0 of
frame 1 is free to use motion vectors that point into any tile of frame
0. We support 1 (i.e. no tiling), 2 or 4 column-tiles.

The loopfilter crosses tile boundaries. I discussed this briefly with Aki
and he says that's OK. An in-loop loopfilter would need to do some sync
between tile threads, but that shouldn't be a big issue.

Resuls: with tiling disabled, we go up slightly because of improved edge
use in the intra4x4 prediction. With 2 tiles, we lose about ~1% on derf,
~0.35% on HD and ~0.55% on STD/HD. With 4 tiles, we lose another ~1.5%
on derf ~0.77% on HD and ~0.85% on STD/HD. Most of this loss is
concentrated in the low-bitrate end of clips, and most of it is because
of the loss of edges at tile boundaries and the resulting loss of intra
predictors.

TODO:
- more tiles (perhaps allow row-based tiling also, and max. 8 tiles)?
- maybe optionally (for EC purposes), motion vectors themselves
  should not cross tile edges, or we should emulate such borders as
  if they were off-frame, to limit error propagation to within one
  tile only. This doesn't have to be the default behaviour but could
  be an optional bitstream flag.

Change-Id: I5951c3a0742a767b20bc9fb5af685d9892c2c96f
2013-02-05 15:43:03 -08:00
Ronald S. Bultje
822864131b Merge "Add SSE3 versions for sad{32x32,64x64}x4d functions." into experimental 2013-02-05 15:40:46 -08:00
Yaowu Xu
c9ae73b251 Merge "rewrite 4x4 idct and fdct" into experimental 2013-02-05 15:26:36 -08:00
Ronald S. Bultje
58c983d109 Add SSE3 versions for sad{32x32,64x64}x4d functions.
Overall encoding about 15% faster.

Change-Id: I176a775c704317509e32eee83739721804120ff2
2013-02-05 15:21:47 -08:00
John Koleszar
7a07eea13f Convert subpixel filters to use convolve framework
Update the code to call the new convolution functions to do subpixel
prediction rather than the existing functions. Remove the old C and
assembly code, since it is unused. This causes a 50% performance
reduction on the decoder, but that will be resolved when the asm for
the new functions is available.

There is no consensus for whether 6-tap or 2-tap predictors will be
supported in the final codec, so these filters are implemented in
terms of the 8-tap code, so that quality testing of these modes
can continue. Implementing the lower complexity algorithms is a
simple exercise, should it be necessary.

This code produces slightly better results in the EIGHTTAP_SMOOTH
case, since the filter is now applied in only one direction when
the subpel motion is only in one direction. Like the previous code,
the filtering is skipped entirely on full-pel MVs. This combination
seems to give the best quality gains, but this may be indicative of a
bug in the encoder's filter selection, since the encoder could
achieve the result of skipping the filtering on full-pel by selecting
one of the other filters. This should be revisited.

Quality gains on derf positive on almost all clips. The only clip
that seemed to be hurt at all datarates was football
(-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR,
0.347% SSIM.

Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff
2013-02-05 14:23:17 -08:00
Yaowu Xu
fa36981ec8 rewrite 4x4 idct and fdct
This commit changes the 4x4 iDCT to use same algorithm & constants as
other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT.

Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0
2013-02-05 11:42:49 -08:00
Paul Wilkins
81043e8d62 Change definition of NearestMV.
This commit makes the NearestMV match the chosen
best reference MV. It can be a 0,0 or non zero vector
which means the the compound nearest mv mode can
combine a 0,0 and a non zero vector.

Change-Id: I2213d09996ae2916e53e6458d7d110350dcffd7a
2013-02-05 17:03:25 +00:00
Paul Wilkins
3ab538767c Re-factor code for rd thresholds.
Separate out code to set the main encode speed
related rd thresholds. Some values changed from
the initial defaults for various new modes.

Quality test results pending but even the addition
of some further non-zero defaults helps encode speed
somewhat in limited testing on derf clips.

Adjustment of thresholds for quality / speed tradeoff
to follow.

Change-Id: I117ee473157e151a1b93193d5f393449328de20d
2013-02-04 18:48:41 +00:00
Yaowu Xu
c1f611be74 Merge "fix a small bug in 16 point forward dct" into experimental 2013-02-01 05:57:41 -08:00
Frank Galligan
f67d740b34 Add support for x64 and win64 yasm flags.
Some projects must define only win64 for Windows 64bit builds using
yasm.

Change-Id: I1d09590d66a7bfc8b4412e1cc8685978ac60b748
2013-01-31 16:25:37 -08:00
Yaowu Xu
ab1cad9bdd fix a small bug in 16 point forward dct
The commit fixes a minor error in 16 point fdct where in a rotation can
produce result of -1 instead of 0.

Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade
2013-01-31 15:39:41 -08:00
Deb Mukherjee
a53be60904 Merge "Adding a frame parallel decoding mode" into experimental 2013-01-30 12:03:45 -08:00
Ronald S. Bultje
b499c24c2f Merge "don't code the branch for the predicted seg_id if that flag is false." into experimental 2013-01-30 10:02:51 -08:00
Ronald S. Bultje
3a4b18bc67 don't code the branch for the predicted seg_id if that flag is false.
Change-Id: Icb6e21dc0c2d9918faa33c8bf70943660df7ad88
2013-01-30 09:30:46 -08:00
Ronald S. Bultje
3febf9707d Default superblock skip flag to 32x32 for skip-blocks.
This is identical to the later decisions made in encode_superblock().
This commit doesn't actually change anything, but makes the mbmi state
more consistent between the RD loop and the final encode result.

Change-Id: I9e735afb7c5a52e5b61728cb88c67ef9b9bf59be
2013-01-29 21:46:31 -08:00
Ronald S. Bultje
b90996c51b Reset skip flag in superblock RD loop.
This is the superblock equivalent of commit 290b83a.

Change-Id: Ib3945dd9e992fa9ec1fdea5a11e17a3cc0e37637
2013-01-29 21:42:56 -08:00
Ronald S. Bultje
5a9da2d906 Merge "Fix block pointer corruption in intra8x8 prediction with 4x4 transform." into experimental 2013-01-29 12:49:42 -08:00
Paul Wilkins
d8e86af263 Merge "Remove eob_max_offset markers." into experimental 2013-01-29 09:29:45 -08:00
Paul Wilkins
5d1c62c639 Merge "Segment Skip Flag" into experimental 2013-01-29 09:29:26 -08:00
Ronald S. Bultje
ffc2e4f4af Fix block pointer corruption in intra8x8 prediction with 4x4 transform.
The RD loop would change the pointer after the first mode (DC) was tested,
leading to corrupt block objects being provided for the others. This
would essentially render the i8x8 predictor useless.

Change-Id: I16c5906ca64fb34878ac32ce59af8974e4582bb8
2013-01-29 09:18:47 -08:00
Paul Wilkins
93762ca9b2 Remove eob_max_offset markers.
Remove eob_max_offset markers and replace
with the generic skip_block flag to indicate
to the quantizer that all coeffs to be set to 0
and eob position set to 0;

Change-Id: Id477e8f8d4ec1a5562758904071013c24b76bfd7
2013-01-29 13:39:34 +00:00
Paul Wilkins
0ff9b033b0 Segment Skip Flag
First step in simplifying the segment mode and
segment EOB flags into a simpler segment skip
flag that implies 0,0 mv and EOB at position 0.

Change-Id: Ib750cac31a7a02dc21082580498efd9f7d8d72a5
2013-01-28 17:28:04 +00:00
Paul Wilkins
5f2429259f Merge "Simplify Zero bin and zero bin run code." into experimental 2013-01-28 08:35:36 -08:00
Paul Wilkins
8e2c03fbfd Simplify Zero bin and zero bin run code.
Simplification to eliminate a number of very large data
data structures. All zero run, zbin boosts for different
transform sizes are now limited to a maximum run length
of 15 before they max out the boost.

Some further work still needs be done to refactor, rationalize
and optimize the multiple quantizer functions.

The simplification coupled with tweaks to the 16 element array
now used for all transform sizes, has minimal effect on quality.

Change-Id: I6f3948b8ca0418b60d4db9030ff19026a34ed423
2013-01-28 13:21:10 +00:00
Deb Mukherjee
dfd89f2eab Adding a frame parallel decoding mode
Adds a flag to disable features that would inhibit frame parallel
decoding. This includes backward adaptation and MV sorting based
on search in ref frame buffer.

Also includes some minor clean-ups.

Change-Id: I434846717a47b7bcb244b37ea670c5cdf776f14d
2013-01-25 17:16:19 -08:00
Ronald S. Bultje
3ca5b35ce5 Merge "Remove "update_context" variable from VP9_COMP context." into experimental 2013-01-25 09:43:42 -08:00
Ronald S. Bultje
0a7b3953f0 Remove "update_context" variable from VP9_COMP context.
The variable is always zero.

Change-Id: Id5cdbecad543bca465a5b1d471badaec7e112c8d
2013-01-24 16:28:53 -08:00
Deb Mukherjee
01cafaab1d Adds an error-resilient mode with test
Adds an error-resilient mode where frames can be continued
to be decoded even when there are errors (due to network losses)
on a prior frame. Specifically, backward updates are turned off
and probabilities of various symbols are reset to defaults at
the beginning of each frame. Further, the last frame's mvs are
not used for the mv reference list, and the sorting of the
initial list based on search on previous frames is turned off
as well.

Also adds a test where an arbitrary set of frames are skipped
from decoding to simulate errors. The test verifies (1) that if
the error frames are droppable - i.e. frame buffer updates have
been turned off - there are no mismatch errors for the remaining
frames after the error frames; and (2) if the error-frames are non
droppable, there are not only no decoding errors but the mismatch
PSNR between the decoder's version of the post-error frames and the
encoder's version is at least 20 dB.

Change-Id: Ie6e2bcd436b1e8643270356d3a930e8989ff52a5
2013-01-23 21:56:15 -08:00
John Koleszar
2f24ad9e85 Use alt-ref frame context for keyframes
This matches the behavior prior to generalizing the frame context
selection, and intuitively makes sense in that the first forward ref
is immediately after the keyframe, so it's quality is improved a bit
by using the keyframe's entropy context rather than the default.

Change-Id: Ia82cef79382b9d8cfafdc44ba0533d4dc3e44053
2013-01-18 14:40:39 -08:00
Frank Galligan
9ca907b53e libvpx: Fix some warnings.
Change-Id: If8be8b9d28a29631f29c46daea8a226ab3580610
2013-01-18 09:51:57 -08:00
John Koleszar
26bd81b955 Preserve the previous golden frame on golden updates
This commit restores the quality lost when the buffer-to-buffer copy
logic was removed. Note that this is specific to the current use of
golden frames and will need rework when RTC functionality is added.

Change-Id: I7324a75acd96eafd9e0f9b8633d782e390d5dc21
2013-01-16 15:57:02 -08:00
John Koleszar
4b65837bc6 Generalize and increase frame coding contexts
Previously there were two frame coding contexts tracked, one for normal
frames and one for alt-ref frames. Generalize this by signalling the
context to use in the bitstream, rather than tieing it to the alt ref
refresh bit. Also increase the number of contexts available to 4, which
may be useful for temporal scalability.

Change-Id: I7b66daaddd55c535c20cd16713541fab182b1662
2013-01-16 14:07:27 -08:00
John Koleszar
da832a80e4 Start to anonymize reference frames
Remove lst_fb_idx, gld_fb_idx, alt_fb_idx, refresh_last_frame,
refresh_golden_frame, refresh_alt_ref_frame from common. Gold/Alt are
encode side conventions. From the decoder's perspective, we want to be
dealing with numbered references.

Updates to active_ref 2 signal mode context switches, vestigial from
refresh_alt_ref_frame. This needs some clean up to make sense with
increased numbers of reference frames, as well as reimplementing the
swapping of alt/golden which was previously done using the
buffer-to-buffer copy mechanism removed in an earlier commit.

Change-Id: I7334445158b7666f9295d2a2dd22aa03f4485f58
2013-01-16 14:06:23 -08:00
John Koleszar
394b0a6a30 Update encoder to use fb_idx_ref_cnt
Do reference counting the same way on the encoder as the decoder does,
rather than maintaining the 'flags' member of YV12_BUFFER_CONFIG.

Change-Id: I91dc210ffca081acaf9d5c09a06e7461b3c3139c
2013-01-15 17:36:39 -08:00
John Koleszar
b8e027989f Remove buffer-to-buffer copy logic
This is the first in a series of commits to add additional reference
frames to the codec. Each frame will be able to update any of the
available references, but copying between references is not
supported.

Change-Id: I5945b5ce6cc3582c495102b4e7eed4f08c44d5a1
2013-01-15 17:36:39 -08:00
Yaowu Xu
9bf73f46f9 fix a number issues that cause failures
During master jenkins verification proces

Change-Id: I3722b8753eaf39f99b45979ce407a8ea0bea0b89
2013-01-14 18:32:32 -08:00
John Koleszar
24bc1a7189 Use INT64_MAX instead of LLONG_MAX
These variables have the type int64_t, not long long. long long could
be a larger type than 64 bits. Emulate INT64_MAX for older versions of
MSVC, and remove the unreferenced vpx_ports/vpxtypes.h

Change-Id: Ideaca71838fcd3849d816d5ab17aa347c97d03b0
2013-01-14 15:57:21 -08:00
Ronald S. Bultje
c9071601a2 Remove compound intra-intra experiment.
This experiment gives little gains and adds relatively much code
complexity (and it hinders other experiments), so let's get rid of
it.

Change-Id: Id25e79a137a1b8a01138aa27a1fa0ba4a2df274a
2013-01-14 15:47:25 -08:00
Paul Wilkins
e2c696a7aa Merge "Fix compiler warnings" into experimental 2013-01-14 14:20:57 -08:00
Adrian Grange
c7576f97ff Merge "Merge prediction filter" into experimental 2013-01-14 14:18:21 -08:00
Yaowu Xu
113005b11d Fix compiler warnings
The warnings caused verify failure with gerrit for several  commits

Change-Id: I030df8638bd69b8783a3ac58e720ff9f0bfd546c
2013-01-14 13:56:52 -08:00
Adrian Grange
7bcaac3e64 Merge prediction filter
Removed the experimental flag from around the prediction filter.

Change-Id: Ic1dd2db8fe8ac17ed5129f83094d4c5cdd5527d2
2013-01-14 12:57:07 -08:00
Ronald S. Bultje
290b83ab62 Reset x->skip for each iteration in the RD loop.
This prevents ill-defined behaviour, such as setting x->skip for a mode
that is excluded because of frame-level flags (e.g. filter selection,
compound prediction selection), then not breaking out of the RD loop
because the mode is not allowed, but keeping the flag on. Whatever mode
is iterated through next in the RD loop will then carry this flag, and
all sort of bad stuff happens, such as x->skip being set on intra pred
modes.

Change-Id: I5bec46b36e38292174acb1c564b3caf00a9b4b9a
2013-01-14 12:44:32 -08:00
John Koleszar
76ac5b3937 Fix unused variable warnings
Previous commit does not build cleanly on Jenkins with the DWT/DCT
hybrid experiment enabled (--enable-dwtdcthybrid).

Change-Id: Ia67e8f59d17ef2d5200ec6b90dfe6711ed6835a5
2013-01-14 12:12:43 -08:00
Deb Mukherjee
516db21c2c Further enhancements/fixes on dct/dwt hybrid txfm
Fixes some scaling issues. Adds an option to only compute the
dct on the low-low subband for 32x32 and 64x64 blocks using
only a single 16x16 dct after 1 and 2 wavelet decomposition
levels respectively. Also adds an option to use a 8x8 dct
as building block.

Currenlty with the 2/6 filter and with a single 16x16 dct on
the low low band, the reuslts compared to full 32x32 dct is
as follows:
derf: -0.15%
yt: -0.29%
std-hd: -0.18%
hd: -0.6%
These are my current recommended settings, since the 2/6 filter
is very simple.

Results with 8x8 dct are about 0.3% worse.

Change-Id: I00100cdc96e32deced591985785ef0d06f325e44
2013-01-12 16:00:53 -08:00
Paul Wilkins
d27ae620bc Remove INT64_MAX references.
Replace INT64_MAX references with LLONG_MAX
for windows build.

Change-Id: Ib8b45c1e9c15c043b2f54c27ed83b8682b2be34f
2013-01-11 19:45:26 +00:00
Ronald S. Bultje
aa2effa954 Merge tx32x32 experiment.
Change-Id: I615651e4c7b09e576a341ad425cf80c393637833
2013-01-10 08:23:59 -08:00
Ronald S. Bultje
6884a83f06 Merge superblocks64 experiment.
Change-Id: If6c88752dffdb566f8d4322f135145270716fb8e
2013-01-09 17:21:40 -08:00
Adrian Grange
7d6b5425d7 New prediction filter
This patch removes the old pred-filter experiment and replaces it
with one that is implemented using the switchable filter framework.

If the pred-filter experiment is enabled, three interopolation
filters are tested during mode selection; the standard 8-tap
interpolation filter, a sharp 8-tap filter and a (new) 8-tap
smoothing filter.

The 6-tap filter code has been preserved for now and if the
enable-6tap experiment is enabled (in addition to the pred-filter
experiment) the original 6-tap filter replaces the new 8-tap smooth
filter in the switchable mode.

The new experiment applies the prediction filter in cases of a
fractional-pel motion vector. Future patches will apply the filter
where the mv is pel-aligned and also to intra predicted blocks.

Change-Id: I08e8cba978f2bbf3019f8413f376b8e2cd85eba4
2013-01-09 12:00:39 -08:00
Deb Mukherjee
4b7304ee68 Adds 64x64 hybrid dct/dwt transform
This is to add to the 64x64 transform experiment as an alternative to
a 64x64 DCT.
Two levels of wavelet decomposition is used on a 64x64 block, followed
by 16x16 DCT on the four lowest subbands. The highest three subbands
are left untransformed after the first level DWT.

Change-Id: I3d48d5800468d655191933894df6b46e15adca56
2013-01-08 14:05:58 -08:00
Ronald S. Bultje
4455036cfc Merge superblocks (32x32) experiment.
Change-Id: I0df99742029834a85c4933652b0587cf5b6b2587
2013-01-08 12:54:45 -08:00
John Koleszar
879cb7d962 Merge vp9-preview changes into experimental branch
Incorportate vp9-preview changes by merging master branch into experimental.

Conflicts:
	test/test.mk
	vp9/common/vp9_filter.c
	vp9/common/vp9_idctllm.c
	vp9/common/vp9_invtrans.h
	vp9/common/vp9_mbpitch.c
	vp9/common/vp9_rtcd_defs.sh
	vp9/common/vp9_systemdependent.h
	vp9/common/vp9_type_aliases.h
	vp9/common/x86/vp9_asm_stubs.c
	vp9/common/x86/vp9_subpixel_mmx.asm
	vp9/decoder/vp9_decodframe.c
	vp9/decoder/vp9_dequantize.c
	vp9/decoder/vp9_dequantize.h
	vp9/decoder/vp9_onyxd_int.h
	vp9/encoder/vp9_bitstream.c
	vp9/encoder/vp9_encodeframe.c
	vp9/encoder/vp9_rdopt.c

Change-Id: I17f51c3666d1b59cf1a699f87607cbc5d30a87c5
2013-01-08 10:19:59 -08:00
Yaowu Xu
c14439c3d3 reset segement map on key frame
This is to fix a decoder crash when decoder skips a number of frame to
continue decoding from a later key frame.

Change-Id: I3ba116eba6c3440e0528a21f53745f694302e4ad
2013-01-08 08:54:45 -08:00
Yaowu Xu
08e207ad04 Merge "minor loop filter refactoring and cleanup" into experimental 2013-01-08 08:40:03 -08:00
Yaowu Xu
d278d01836 minor loop filter refactoring and cleanup
This commit did a couple of minor cleanup/refactoring to prepare for
futher loop filter experiments. It merged y_only version of loop filter
function into the regular one, which makes sure that same logic is used
for functions for picking level and for actual loop filtering.

Change-Id: Id10c94dccd45f58e5310bacfdf6ee63cbb60b86f
2013-01-07 16:23:58 -08:00
Ronald S. Bultje
3ed14846e1 Remove a few redundant function arguments in encodeframe.c.
Also reindent a block of code that was misindented after addition of
the tx32x32 experiment.

Change-Id: Ic3e4aae3effd8a40136da68c9f382af03632ba08
2013-01-07 11:41:49 -08:00
Ronald S. Bultje
c13d9fef42 Re-enable support for static_threshold (encode_breakout).
Change-Id: Ibd7380f478d3127f9db91d0a4fd2fd0dfde961ab
2013-01-07 11:02:14 -08:00
Ronald S. Bultje
e6216d163a Don't use tx32x32 for macroblocks.
Change-Id: Ib674e0153ca360867ab7a20ba291ac9171a01250
2013-01-07 09:40:19 -08:00
Ronald S. Bultje
c3941665e9 64x64 blocksize support.
3.2% gains on std/hd, 1.0% gains on hd.

Change-Id: I481d5df23d8a4fc650a5bcba956554490b2bd200
2013-01-05 18:20:25 -08:00
Adrian Grange
81d1171fd4 Fix mode selection infinite loop bug
Mode selection for SBs could enter an infinite loop because
the interpolation filter mode index was not being reset
correctly.

Change-Id: I4bbe726f29ef5b6836e94884067c46084713cc11
2013-01-04 09:00:47 -08:00
Yaowu Xu
df7ce5a711 Merge "make cost_coeffs() and tokenize_b() consistent" into experimental 2013-01-03 09:57:07 -08:00
Yaowu Xu
818f5698fb Merge "Merge cost_coeffs_2x2() into cost_coeffs()" into experimental 2013-01-03 09:33:21 -08:00
Yaowu Xu
83664f457b make cost_coeffs() and tokenize_b() consistent
Change-Id: I7cdb5c32a1400f88ec36d08ea982e38b77731602
2013-01-03 09:31:47 -08:00
Adrian Grange
259b800832 New interpolation filter selection algorithm
Old Scheme:
When SWITCHABLE filter selection is enabled the encoder
evaluates the use of each interpolation filter type and
selects the best one to use at the MB level. A frame-
level flag can be set to force the use of a particular
filter type for all MBs in a frame if it is more efficient
to encode that way. The logic here involved a Q dependent
threshold that assumed that the second 8-tap filter was
a high-pass filter. However, this requires a trip around
the recode loop. If the frame-level flag indicates use
of a particular filter, the other filters are not
evaluated in the pick_mode loop.

New Scheme:
Each filter type is evaluated at the MB level and a record
of the best filter is kept, irrespective of what filter
is signaled at the frame-level. Once all MBs have been
encoded, a decision is made as to what frame-level mode
to set for the *next* frame. If one filter is used by 80%
or more of the MBs, then this filter is forced since it
is assumed that this will be more efficient if the
next frame has similar characteristics. i.e. there is a
one-frame lag between measuring the filter selection and
setting the frame-level mode to use.

Change-Id: I6a7e7ced8f27e120fafb99db2dc9c6293f8d20f7
2013-01-03 08:12:43 -08:00
Yaowu Xu
bd28510ef9 Merge cost_coeffs_2x2() into cost_coeffs()
Remove special case function cost_coeffs_2x2() and change function
cost_coeffs() to handle 2nd order haar block as it is handle all
other block types already.

Change-Id: I2aac6f81ee0ae9e03d6a8da4f8681d69b79ce41f
2013-01-03 08:00:00 -08:00
Paul Wilkins
cad4a91429 Change INT64_MAX to LLONG_MAX
This is needed to make the windows build work after
the removal of vp9_type_alisases.h.

Change-Id: I8addf38e9f3c8b864e0e30a8916a26e0264dd02c
2013-01-02 18:06:00 +00:00
Paul Wilkins
313d1100af Added update-able mv-ref probabilities.
Part of NEW_MVREF experiment.
Added update-able probabilities.

Change-Id: I5a4fcf4aaed1d0d1dac980f69d535639a3d59401
2013-01-02 14:22:11 +00:00
John Koleszar
5ebe94f9f1 Build fixes to merge vp9-preview into master
Various fixups to resolve issues when building vp9-preview under the more stringent
checks placed on the experimental branch.

Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
2012-12-26 11:21:09 -08:00
Jim Bankoski
1dffce7f96 add private to assembly files to insure proper chromebuild
Change-Id: I6e43ca73f35401a974ed8ee27738d4318f09fd37
2012-12-20 09:40:18 -08:00
Deb Mukherjee
08f0c7cc9c New previous coef context experiment
Adds an experiment to derive the previous context of a coefficient
not just from the previous coefficient in the scan order but from a
combination of several neighboring coefficients previously encountered
in scan order.  A precomputed table of neighbors for each location
for each scan type and block size is used. Currently 5 neighbors are
used.

Results are about 0.2% positive using a strategy where the max coef
magnitude from the 5 neigbors is used to derive the context.

Change-Id: Ie708b54d8e1898af742846ce2d1e2b0d89fd4ad5
2012-12-19 18:49:39 -08:00
John Koleszar
05ec800ea4 Use boolcoder API instead of inlining
This patch changes the token packing to call the bool encoder API rather
than inlining it into the token packing function, and similarly removes
a special get_signed case from the detokenizer. This allows easier
experimentation with changing the bool coder as a whole.

Change-Id: I52c3625bbe4960b68cfb873b0e39ade0c82f9e91
2012-12-19 12:52:41 -08:00
Ronald S. Bultje
4cca47b538 Use standard integer types for pixel values and coefficients.
For coefficients, use int16_t (instead of short); for pixel values in
16-bit intermediates, use uint16_t (instead of unsigned short); for all
others, use uint8_t (instead of unsigned char).

Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da
2012-12-18 15:31:19 -08:00
Yaowu Xu
b41c3583ac Merge "correct logic in cnvcontext experiment for tx32x32" into experimental 2012-12-18 14:23:39 -08:00
Ronald S. Bultje
5cab8b7a18 Merge "Give 4x4 scan and coef_band tables a _4x4 suffix." into experimental 2012-12-18 14:17:46 -08:00
Yaowu Xu
de269c8a62 correct logic in cnvcontext experiment for tx32x32
Change-Id: I004ded11983b7fda85793912ebc5c6f266dc5eb5
2012-12-18 13:53:17 -08:00
Yunqing Wang
779c5f28a8 Fix uninitialized warning
Fixed uninitialized warning for txfm_size.

Change-Id: I42b7e802c3e84825d49f34e632361502641b7cbf
2012-12-18 13:19:04 -08:00
Ronald S. Bultje
8986eb5c26 Give 4x4 scan and coef_band tables a _4x4 suffix.
This matches the names of tables for all other transform sizes.

Change-Id: Ia7681b7f8d34c97c27b0eb0e34d490cd0f8d02c6
2012-12-18 10:49:10 -08:00
John Koleszar
1306ba7659 Remove vp9_type_aliases.h
Prefer the standard fixed-size integer typedefs.

Change-Id: Iad75582350669e49a8da3b7facb9c259e9514a5b
2012-12-17 11:32:37 -08:00
Paul Wilkins
d8f5d1b257 Problem of over smoothing with intra modes.
In some cases intra modes in inter frames give
an over smoothed appearance. Especially with
noisy but flat content.

Also in some cases there were problems with key
frame sizing again with very flat but noisy content.

These are temporary changes to help alleviate the
visual problems but will almost certainly hurt metric
results especially at the very low data rate end.

Change-Id: I11549179a19277ffc283d9788bc70168f2a8bdc9
2012-12-17 11:54:17 +00:00
Yaowu Xu
6247b239bc reset segement map on key frame
This is to fix a decoder crash when decoder skips a number of frame to
continue decoding from a later key frame.

Change-Id: I3ba116eba6c3440e0528a21f53745f694302e4ad
2012-12-14 06:35:32 -08:00
Yaowu Xu
2b9ec585d6 fixed an encoder/decoder mismatch
The mismatch was caused by an improper merge of cleanup code around
tokenize_b() and stuff_b() with TX32X32 experiment.

Change-Id: I225ae62f015983751f017386548d9c988c30664c
2012-12-13 15:33:21 -08:00
Deb Mukherjee
7fa3deb1f5 Build fixes with teh super blcoks and 32x32 expts
Change-Id: I3c751f8d57ac7d3b754476dc6ce144d162534e6d
2012-12-13 12:18:38 -08:00
Deb Mukherjee
9c318ee371 Merge "Further improvements on the hybrid dwt/dct expt" into experimental 2012-12-13 11:04:56 -08:00
Deb Mukherjee
210dc5b2db Further improvements on the hybrid dwt/dct expt
Modifies the scanning pattern and uses a floating point 16x16
dct implementation for now to handle scaling better.
Also experiments are in progress with 2/6 and 9/7 wavelets.

Results have improved to within ~0.25% of 32x32 dct for std-hd
and about 0.03% for derf. This difference can probably be bridged by
re-optimizing the entropy stats for these transforms. Currently
the stats used are common between 32x32 dct and dwt/dct.

Experiments are in progress with various scan pattern - wavelet
combinations.

Ideally the subbands should be tokenized separately, and an
experiment will be condcuted next on that.

Change-Id: Ia9cbfc2d63cb7a47e562b2cd9341caf962bcc110
2012-12-13 10:37:49 -08:00
Ronald S. Bultje
f4608e3606 Merge "New default coefficient/band probabilities." into experimental 2012-12-13 09:56:50 -08:00
Ronald S. Bultje
5a5df19de3 New default coefficient/band probabilities.
Gives 0.5-0.6% improvement on derf and stdhd, and 1.1% on hd. The
old tables basically derive from times that we had only 4x4 or
only 4x4 and 8x8 DCTs.

Note that some values are filled with 128, because e.g. ADST ever
only occurs as Y-with-DC, as does 32x32; 16x16 ever only occurs
as Y-with-DC or as UV (as complement of 32x32 Y); and 8x8 Y2 ever
only has 4 coefficients max. If preferred, I can add values of
other tables in their place (e.g. use 4x4 2nd order high-frequency
probabilities for 8x8 2nd order), so that they make at least some
sense if we ever implement a larger 2nd order transform for the
8x8 DCT (etc.), please let me know

Change-Id: I917db356f2aff8865f528eb873c56ef43aa5ce22
2012-12-12 16:23:57 -08:00
Scott LaVarnway
b575394e21 Improved vp9_ihtllm_c
As suggested by Yaowu, we can use eob to reduce the complexity
of the vp9_ihtllm_c function.  For the 1080p test clip used, the decoder
performance improved by 17%.

Change-Id: I32486f2f06f9b8f60467d2a574209aa3a3daa435
2012-12-12 15:49:39 -08:00
Ronald S. Bultje
39de1e14ed Merge "Consistently use get_prob(), clip_prob() and newly added clip_pixel()." into experimental 2012-12-12 10:34:14 -08:00
Ronald S. Bultje
4d0ec7aacd Consistently use get_prob(), clip_prob() and newly added clip_pixel().
Add a function clip_pixel() to clip a pixel value to the [0,255] range
of allowed values, and use this where-ever appropriate (e.g. prediction,
reconstruction). Likewise, consistently use the recently added function
clip_prob(), which calculates a binary probability in the [1,255] range.
If possible, try to use get_prob() or its sister get_binary_prob() to
calculate binary probabilities, for consistency.

Since in some places, this means that binary probability calculations
are changed (we use {255,256}*count0/(total) in a range of places,
and all of these are now changed to use 256*count0+(total>>1)/total),
this changes the encoding result, so this patch warrants some extensive
testing.

Change-Id: Ibeeff8d886496839b8e0c0ace9ccc552351f7628
2012-12-12 10:01:19 -08:00
Yaowu Xu
0c35b27689 Merge "clean up tokenize_b() and stuff_b()" into experimental 2012-12-11 13:51:56 -08:00
Yaowu Xu
899f0fc126 clean up tokenize_b() and stuff_b()
Change-Id: I0c1be01aae933243311ad321b6c456adaec1a0f5
2012-12-11 13:32:16 -08:00
Yaowu Xu
6b380c0cfa Merge "experiment with CONTEXT conversion" into experimental 2012-12-11 09:46:36 -08:00
Deb Mukherjee
f09c4cde85 Merge "A bug fix related to switchable filters" into experimental 2012-12-10 12:28:06 -08:00
Deb Mukherjee
14a38a8735 A bug fix related to switchable filters
The switchable count update was mistakenly inside a macro.

Change-Id: Iec04c52ad57034b88312dbaf05eee1f47ce265b3
2012-12-10 12:10:36 -08:00
Paul Wilkins
d124465975 Further changes to mv reference code.
Some further changes and refactoring of mv
reference code and selection of center point for
searches. Mainly relates to not passing so many
different local copies of things around.

Some place holder comments.

Change-Id: I309f10ffe9a9cde7663e7eae19eb594371c8d055
2012-12-10 17:31:51 +00:00
John Koleszar
d1356faeb8 Merge remote-tracking branch 'origin/vp9-preview' into experimental 2012-12-07 17:26:31 -08:00
Yaowu Xu
ab480cede5 experiment with CONTEXT conversion
This commit changed the ENTROPY_CONTEXT conversion between MBs that
have different transform sizes.

In additioin, this commit also did a number of cleanup/bug fix:
1. removed duplicate function vp9_fix_contexts() and changed to use
vp8_reset_mb_token_contexts() for both encoder and decoder
2. fixed a bug in stuff_mb_16x16 where wrong context was used for
the UV.
3. changed reset all context to 0 if a MB is skipped to simplify the
logic.

Change-Id: I7bc57a5fb6dbf1f85eac1543daaeb3a61633275c
2012-12-07 17:25:45 -08:00
Jim Bankoski
26a4918282 Merge "Fix meaninglesss if." into vp9-preview 2012-12-07 17:15:52 -08:00
Ronald S. Bultje
885cf816eb Introduce vp9_coeff_probs/counts/stats/accum types.
Use these, instead of the 4/5-dimensional arrays, to hold statistics,
counts, accumulations and probabilities for coefficient tokens. This
commit also re-allows ENTROPY_STATS to compile.

Change-Id: If441ffac936f52a3af91d8f2922ea8a0ceabdaa5
2012-12-07 16:09:59 -08:00
Frank Galligan
1c0ee77589 Fix meaninglesss if.
Change-Id: I0cb06d77805246fe39d39ad3bc5df3c3f52c7050
2012-12-07 15:44:39 -08:00
Frank Galligan
8d449ce0a9 Remove unused symbols from vp9 asm offsets C files.
Change-Id: I366e6d175da3012f1c8607fd7fad99fbbb616091
2012-12-07 15:38:40 -08:00
Ronald S. Bultje
c456b35fdf 32x32 transform for superblocks.
This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
code all over the place to wrap that in the bitstream/encoder/decoder/RD.

Some implementation notes (these probably need careful review):
- token range is extended by 1 bit, since the value range out of this
  transform is [-16384,16383].
- the coefficients coming out of the FDCT are manually scaled back by
  1 bit, or else they won't fit in int16_t (they are 17 bits). Because
  of this, the RD error scoring does not right-shift the MSE score by
  two (unlike for 4x4/8x8/16x16).
- to compensate for this loss in precision, the quantizer is halved
  also. This is currently a little hacky.
- FDCT and IDCT is double-only right now. Needs a fixed-point impl.
- There are no default probabilities for the 32x32 transform yet; I'm
  simply using the 16x16 luma ones. A future commit will add newly
  generated probabilities for all transforms.
- No ADST version. I don't think we'll add one for this level; if an
  ADST is desired, transform-size selection can scale back to 16x16
  or lower, and use an ADST at that level.

Additional notes specific to Debargha's DWT/DCT hybrid:
- coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
  block than for the rest (DWT pixel differences) of the block. Therefore,
  RD error scoring isn't easily scalable between coefficient and pixel
  domain. Thus, unfortunately, we need to compute the RD distortion in
  the pixel domain until we figure out how to scale these appropriately.

Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
2012-12-07 14:45:05 -08:00
Johann
1009f76566 Use 'vpx_scale' consistently
Change-Id: I178352813d2b8702d081caf405de9dbad9af2cc3
2012-12-05 16:05:44 -08:00
Paul Wilkins
7405040142 Merge "Change to MV reference search." into experimental 2012-12-05 09:14:46 -08:00
Johann
52d350febf Begin to refactor vpx_scale usage in VP9
Only declare the functions in vpx_scale RTCD and include the relevant
header.

Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.

The 'vp8' functions have not been renamed yet. That is for after the
cleanup.

Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c
2012-12-05 08:59:40 -08:00
Johann
a905672906 Remove ARM optimizations from VP9
Change-Id: I9f0ae635fb9a95c4aa1529c177ccb07e2b76970b
2012-12-05 08:59:25 -08:00
John Koleszar
5d91a1e0ae Merge remote-tracking branch 'origin/vp9-preview' into experimental 2012-12-05 08:41:35 -08:00
John Koleszar
4a4d2aa55c vp9_bilinear_filters_mmx: add missing extern specifiers
Change-Id: Ibabf18947f90cb4f45052763ebf44cfb8209bd8b
2012-12-05 08:27:48 -08:00
Paul Wilkins
4cc657ec6e Change to MV reference search.
This patch reduces the cpu cost of the MV ref
search by only allowing insert for candidates
that would be in the current top 4.

This could alter the outcome and slightly favors
near candidates which are tested first but also
limits the worst case loop count to 4 and means in
many cases it will drop out and not happen.

Change-Id: Idd795a825f9fd681f30f4fcd550c34c38939e113
2012-12-05 14:03:45 +00:00
Johann
d138262ac0 Merge "Begin to refactor vpx_scale usage in VP9" into experimental 2012-12-04 15:23:42 -08:00
Frank Galligan
48556db7b2 Merge "vp9: Fix assert check." into vp9-preview 2012-12-03 17:29:46 -08:00
Yaowu Xu
806d05e1a8 merged optimiz_b_16x16() into optmize_b()
The commit changed the trellis quantization function optimize_b() to
work for MBs using all transform sizes, and eliminated the function
for MB using 16x16 transform only, optimize_b_16x16.

Change-Id: I3fa650587ab5198ed16315b38754783a72b33ba2
2012-12-03 14:53:45 -08:00
Johann
57e72208b3 Merge "Remove ARM optimizations from VP9" into experimental 2012-12-03 13:54:38 -08:00
Johann
c6bd29e2f5 Begin to refactor vpx_scale usage in VP9
Only declare the functions in vpx_scale RTCD and include the relevant
header.

Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.

The 'vp8' functions have not been renamed yet. That is for after the
cleanup.

Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c
2012-12-03 12:51:56 -08:00
Johann
34591b54dd Remove ARM optimizations from VP9
Change-Id: I9f0ae635fb9a95c4aa1529c177ccb07e2b76970b
2012-12-03 12:50:15 -08:00
Jim Bankoski
b95338c7ab Merge "fixes --disable-vp9-encoder" into vp9-preview 2012-12-03 12:41:31 -08:00
Jim Bankoski
d9038b3c60 fixes --disable-vp9-encoder
Change-Id: I467bf0fdf3b35326bcce58d5459e6d2dbfd6c5e5
2012-12-03 12:21:16 -08:00
Frank Galligan
0d687ed22b vp9: Fix assert check.
Change-Id: If0cc1ab60dff6abd67dae7c7b3dc83a1afd7fe65
2012-12-03 12:18:59 -08:00
Frank Galligan
3e0ea7f6e1 vp9: Remove superfluous command.
- vpx_calloc is called on arf_not_zz above.
- Note The removed vpx_memset call had an issue with sizeof.

Change-Id: I86fd7a167d0a042e581e613e2a6c0b5e63073fc6
2012-12-03 10:26:15 -08:00
Deb Mukherjee
8b92f1e023 Supports inter-intra prediction with superblocks
Adds support for compound inter-intra prediction with superblocks.
Also, fixes a bug that disabled intra modes for superblocks.

Change-Id: I4d711317e1bc19df8c2f32dc645429f7fff31036
2012-12-01 15:19:55 -08:00