Commit Graph

439 Commits

Author SHA1 Message Date
Paul Wilkins
81043e8d62 Change definition of NearestMV.
This commit makes the NearestMV match the chosen
best reference MV. It can be a 0,0 or non zero vector
which means the the compound nearest mv mode can
combine a 0,0 and a non zero vector.

Change-Id: I2213d09996ae2916e53e6458d7d110350dcffd7a
2013-02-05 17:03:25 +00:00
Scott LaVarnway
77440d508b Merge "Added vp9_short_idct1_32x32_c" into experimental 2013-02-05 08:56:05 -08:00
Scott LaVarnway
5780c4cbd5 Added vp9_short_idct1_32x32_c
and called this function in vp9_dequant_idct_add_32x32_c when
eob == 1.  For the test clip used, the decoder performance improved
by 21+%.  Based on Yaowu's 16 point idct work.

Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43
2013-02-04 16:49:17 -08:00
Paul Wilkins
3ab538767c Re-factor code for rd thresholds.
Separate out code to set the main encode speed
related rd thresholds. Some values changed from
the initial defaults for various new modes.

Quality test results pending but even the addition
of some further non-zero defaults helps encode speed
somewhat in limited testing on derf clips.

Adjustment of thresholds for quality / speed tradeoff
to follow.

Change-Id: I117ee473157e151a1b93193d5f393449328de20d
2013-02-04 18:48:41 +00:00
Yaowu Xu
1eb79dc1dc re-write 8 point idct
to be consistent with idct16 and idct32.

Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204
2013-02-04 07:31:25 -08:00
Yaowu Xu
ccaaeb4b5a a couple of minor fixes
fixed a function prototypes to prevent compiler warnings;
removed a function not in use;
un-capitialize "Refstride" to ref_stride

Change-Id: Ib4472b6084f357d96328c6a06e795b6813a9edba
2013-02-04 07:19:32 -08:00
Yaowu Xu
af4c9d2f88 Merge "Changes 16 point idct" into experimental 2013-02-01 08:22:20 -08:00
Yaowu Xu
c1f611be74 Merge "fix a small bug in 16 point forward dct" into experimental 2013-02-01 05:57:41 -08:00
Yaowu Xu
91e0e80142 Changes 16 point idct
This commit changes the inverse 16 point dct to use the same algorithm
as the one for 32 point idct. In fact, now 16 point dct uses the exact
version of the souce code for even portion of the 32 point idct.

Tests showed current implementation has significant better accuracy
than the previous version. With this implementation and the minor bug
fix on forward 16 point dct, encoding tests showed about 0.2% better
compression of CIF set, test results on std-hd setting pending.

Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63
2013-01-31 19:52:18 -08:00
Yaowu Xu
ab1cad9bdd fix a small bug in 16 point forward dct
The commit fixes a minor error in 16 point fdct where in a rotation can
produce result of -1 instead of 0.

Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade
2013-01-31 15:39:41 -08:00
Yaowu Xu
c94e55add0 Merge "A fix point implementation of 32x32 idct" into experimental 2013-01-31 10:48:01 -08:00
Yaowu Xu
5149d7f7bd A fix point implementation of 32x32 idct
This commit changes the 32x32 idct to use integer only. The algorithm
was taken directly from "A Fast Computational Algorithm for the
Discrete Cosine Tranform" by W. Chen, et al., which was published in
IEEE Transaction on Communication Vol. Com.-25 No. 9, 1977. The signal
flow graph in the original paper is for a 32 point forward dct, the
current implementation of inverse DCT was done by follow the graph in
reversed direction.

With this implementation, the 32 point inverse dct contains a 16 point
inverse dct in its even portion, similarly the 16 point idct further
contains 8 point and 4 point inverse dcts.

As of patch 4, encoding tests showed there is no compression loss when
compared against the floating point baseline. Numbers even showed very
small postives. (cif: .01%, std-hd: .05%).

Change-Id: I2d2d17a424b0b04b42422ef33ec53f5802b0f378
2013-01-31 09:45:49 -08:00
Deb Mukherjee
a53be60904 Merge "Adding a frame parallel decoding mode" into experimental 2013-01-30 12:03:45 -08:00
Ronald S. Bultje
b499c24c2f Merge "don't code the branch for the predicted seg_id if that flag is false." into experimental 2013-01-30 10:02:51 -08:00
Ronald S. Bultje
3a4b18bc67 don't code the branch for the predicted seg_id if that flag is false.
Change-Id: Icb6e21dc0c2d9918faa33c8bf70943660df7ad88
2013-01-30 09:30:46 -08:00
Ronald S. Bultje
4d53a95a34 Merge "Default superblock skip flag to 32x32 for skip-blocks." into experimental 2013-01-30 09:12:17 -08:00
Ronald S. Bultje
de6718a3b9 Merge "Reset skip flag in superblock RD loop." into experimental 2013-01-30 09:12:02 -08:00
Deb Mukherjee
d28750537e Merge "Further improvement on compound inter-intra expt" into experimental 2013-01-30 08:38:17 -08:00
Ronald S. Bultje
3febf9707d Default superblock skip flag to 32x32 for skip-blocks.
This is identical to the later decisions made in encode_superblock().
This commit doesn't actually change anything, but makes the mbmi state
more consistent between the RD loop and the final encode result.

Change-Id: I9e735afb7c5a52e5b61728cb88c67ef9b9bf59be
2013-01-29 21:46:31 -08:00
Ronald S. Bultje
b90996c51b Reset skip flag in superblock RD loop.
This is the superblock equivalent of commit 290b83a.

Change-Id: Ib3945dd9e992fa9ec1fdea5a11e17a3cc0e37637
2013-01-29 21:42:56 -08:00
Ronald S. Bultje
2f6fce3e5a Write only visible area (for better comparison with rec.yuv).
Change-Id: I32bf4ee532a15af78619cbcd8a193224029fab50
2013-01-29 16:58:52 -08:00
Ronald S. Bultje
5a9da2d906 Merge "Fix block pointer corruption in intra8x8 prediction with 4x4 transform." into experimental 2013-01-29 12:49:42 -08:00
Ronald S. Bultje
64401f838f Merge "Fix overread/write reported by valgrind if (mb_cols) & 3 != 0." into experimental 2013-01-29 12:49:22 -08:00
Paul Wilkins
d8e86af263 Merge "Remove eob_max_offset markers." into experimental 2013-01-29 09:29:45 -08:00
Paul Wilkins
5d1c62c639 Merge "Segment Skip Flag" into experimental 2013-01-29 09:29:26 -08:00
Scott LaVarnway
8b7eced6fe Merge "Added eob == 0 check to vp9_dequant_idct_add_32x32_c" into experimental 2013-01-29 09:19:58 -08:00
Ronald S. Bultje
ffc2e4f4af Fix block pointer corruption in intra8x8 prediction with 4x4 transform.
The RD loop would change the pointer after the first mode (DC) was tested,
leading to corrupt block objects being provided for the others. This
would essentially render the i8x8 predictor useless.

Change-Id: I16c5906ca64fb34878ac32ce59af8974e4582bb8
2013-01-29 09:18:47 -08:00
Paul Wilkins
93762ca9b2 Remove eob_max_offset markers.
Remove eob_max_offset markers and replace
with the generic skip_block flag to indicate
to the quantizer that all coeffs to be set to 0
and eob position set to 0;

Change-Id: Id477e8f8d4ec1a5562758904071013c24b76bfd7
2013-01-29 13:39:34 +00:00
Deb Mukherjee
3b04d467ac Further improvement on compound inter-intra expt
Adds a special combination mode specific to intra prediciton
mode D45.

Current results with the compound inter/intra experiment:
derf: 0.2%
yt: 0.55%
std-hd: 0.75%
hd: 0.74%

Change-Id: I8976bdf3b9b0b66ab8c5c628bbc62c14fc72ca86
2013-01-29 00:21:29 -08:00
Paul Wilkins
0ff9b033b0 Segment Skip Flag
First step in simplifying the segment mode and
segment EOB flags into a simpler segment skip
flag that implies 0,0 mv and EOB at position 0.

Change-Id: Ib750cac31a7a02dc21082580498efd9f7d8d72a5
2013-01-28 17:28:04 +00:00
Paul Wilkins
5f2429259f Merge "Simplify Zero bin and zero bin run code." into experimental 2013-01-28 08:35:36 -08:00
Paul Wilkins
8e2c03fbfd Simplify Zero bin and zero bin run code.
Simplification to eliminate a number of very large data
data structures. All zero run, zbin boosts for different
transform sizes are now limited to a maximum run length
of 15 before they max out the boost.

Some further work still needs be done to refactor, rationalize
and optimize the multiple quantizer functions.

The simplification coupled with tweaks to the 16 element array
now used for all transform sizes, has minimal effect on quality.

Change-Id: I6f3948b8ca0418b60d4db9030ff19026a34ed423
2013-01-28 13:21:10 +00:00
Ronald S. Bultje
9dc9f07fb8 Fix overread/write reported by valgrind if (mb_cols) & 3 != 0.
We'd backup and restore all cols for a 64x64 SB, but the array wouldn't
be big enough to hold all that data.

Change-Id: Ic68ea721bf07e0b2f3937bd16b0b734bcc743ce1
2013-01-25 17:18:08 -08:00
Deb Mukherjee
dfd89f2eab Adding a frame parallel decoding mode
Adds a flag to disable features that would inhibit frame parallel
decoding. This includes backward adaptation and MV sorting based
on search in ref frame buffer.

Also includes some minor clean-ups.

Change-Id: I434846717a47b7bcb244b37ea670c5cdf776f14d
2013-01-25 17:16:19 -08:00
Ronald S. Bultje
3ca5b35ce5 Merge "Remove "update_context" variable from VP9_COMP context." into experimental 2013-01-25 09:43:42 -08:00
Scott LaVarnway
9d4c26531b Added eob == 0 check to vp9_dequant_idct_add_32x32_c
Added a quick eob == 0 check.  Once the integer version of the dct32x32 is
complete, we can check for other eob cases.

For the 1080p clip used, the decoder performance improved by 4%.

Change-Id: I9390b6ed3c8be0c0c0a0c44c578d9a031d6e026e
2013-01-24 17:09:56 -08:00
Ronald S. Bultje
0a7b3953f0 Remove "update_context" variable from VP9_COMP context.
The variable is always zero.

Change-Id: Id5cdbecad543bca465a5b1d471badaec7e112c8d
2013-01-24 16:28:53 -08:00
Paul Wilkins
fcb4a25cd5 Mvref speedup
Quality / decode speed trade off changes.
Simpler insert method without sort. Quality impact small.

Change-Id: Id0c0941bc508d985405abd06a13ffe7489170b62
2013-01-24 17:26:37 +00:00
Scott LaVarnway
70019f6070 Merge "Intrinsic version of loopfilter now matches C code" into experimental 2013-01-24 08:45:22 -08:00
Deb Mukherjee
01cafaab1d Adds an error-resilient mode with test
Adds an error-resilient mode where frames can be continued
to be decoded even when there are errors (due to network losses)
on a prior frame. Specifically, backward updates are turned off
and probabilities of various symbols are reset to defaults at
the beginning of each frame. Further, the last frame's mvs are
not used for the mv reference list, and the sorting of the
initial list based on search on previous frames is turned off
as well.

Also adds a test where an arbitrary set of frames are skipped
from decoding to simulate errors. The test verifies (1) that if
the error frames are droppable - i.e. frame buffer updates have
been turned off - there are no mismatch errors for the remaining
frames after the error frames; and (2) if the error-frames are non
droppable, there are not only no decoding errors but the mismatch
PSNR between the decoder's version of the post-error frames and the
encoder's version is at least 20 dB.

Change-Id: Ie6e2bcd436b1e8643270356d3a930e8989ff52a5
2013-01-23 21:56:15 -08:00
Deb Mukherjee
ebb1157cde Merge "Modifies the comp inter-intra expt" into experimental 2013-01-23 09:43:07 -08:00
Scott LaVarnway
6a997400ff Intrinsic version of loopfilter now matches C code
Updated the instrinsic code to match Yaowu's latest loopfilter change.
(I584393906c4f5f948a581d6590959522572743bb)

The decoder performance improved by ~30% for the test clip used.

Change-Id: I026cfc75d5bcb7d8d58be6f0440ac9e126ef39d2
2013-01-23 09:31:40 -08:00
John Koleszar
bed59eb8de Merge changes Ia82cef79,I7324a75a,I7b66daad,I73344451,I91dc210f,I5945b5ce into experimental
* changes:
  Use alt-ref frame context for keyframes
  Preserve the previous golden frame on golden updates
  Generalize and increase frame coding contexts
  Start to anonymize reference frames
  Update encoder to use fb_idx_ref_cnt
  Remove buffer-to-buffer copy logic
2013-01-22 08:31:55 -08:00
John Koleszar
2f24ad9e85 Use alt-ref frame context for keyframes
This matches the behavior prior to generalizing the frame context
selection, and intuitively makes sense in that the first forward ref
is immediately after the keyframe, so it's quality is improved a bit
by using the keyframe's entropy context rather than the default.

Change-Id: Ia82cef79382b9d8cfafdc44ba0533d4dc3e44053
2013-01-18 14:40:39 -08:00
Yaowu Xu
b95ed6883a a minor change to a portion of loop filtering
The loop filtering used for MB edge or internal edge of a MB using 8x8
tranform was reading 5 pixel each side and writting 3 pixel each side.
With suggestion from Aki and Scott on hardware&software performance,
this commit changed to read 4 pixel each side and write 3 pixel each
side.

Change-Id: I584393906c4f5f948a581d6590959522572743bb
2013-01-18 10:44:13 -08:00
John Koleszar
26bd81b955 Preserve the previous golden frame on golden updates
This commit restores the quality lost when the buffer-to-buffer copy
logic was removed. Note that this is specific to the current use of
golden frames and will need rework when RTC functionality is added.

Change-Id: I7324a75acd96eafd9e0f9b8633d782e390d5dc21
2013-01-16 15:57:02 -08:00
John Koleszar
4b65837bc6 Generalize and increase frame coding contexts
Previously there were two frame coding contexts tracked, one for normal
frames and one for alt-ref frames. Generalize this by signalling the
context to use in the bitstream, rather than tieing it to the alt ref
refresh bit. Also increase the number of contexts available to 4, which
may be useful for temporal scalability.

Change-Id: I7b66daaddd55c535c20cd16713541fab182b1662
2013-01-16 14:07:27 -08:00
John Koleszar
da832a80e4 Start to anonymize reference frames
Remove lst_fb_idx, gld_fb_idx, alt_fb_idx, refresh_last_frame,
refresh_golden_frame, refresh_alt_ref_frame from common. Gold/Alt are
encode side conventions. From the decoder's perspective, we want to be
dealing with numbered references.

Updates to active_ref 2 signal mode context switches, vestigial from
refresh_alt_ref_frame. This needs some clean up to make sense with
increased numbers of reference frames, as well as reimplementing the
swapping of alt/golden which was previously done using the
buffer-to-buffer copy mechanism removed in an earlier commit.

Change-Id: I7334445158b7666f9295d2a2dd22aa03f4485f58
2013-01-16 14:06:23 -08:00
John Koleszar
394b0a6a30 Update encoder to use fb_idx_ref_cnt
Do reference counting the same way on the encoder as the decoder does,
rather than maintaining the 'flags' member of YV12_BUFFER_CONFIG.

Change-Id: I91dc210ffca081acaf9d5c09a06e7461b3c3139c
2013-01-15 17:36:39 -08:00
John Koleszar
b8e027989f Remove buffer-to-buffer copy logic
This is the first in a series of commits to add additional reference
frames to the codec. Each frame will be able to update any of the
available references, but copying between references is not
supported.

Change-Id: I5945b5ce6cc3582c495102b4e7eed4f08c44d5a1
2013-01-15 17:36:39 -08:00
Yaowu Xu
9bf73f46f9 fix a number issues that cause failures
During master jenkins verification proces

Change-Id: I3722b8753eaf39f99b45979ce407a8ea0bea0b89
2013-01-14 18:32:32 -08:00
Deb Mukherjee
b34838bea5 Modifies the comp inter-intra expt
Uses a single 1D table to implement the weighting of the predictors
for the compound inter-intra experiment.

Change-Id: I204ffbe4f9fc79d5d43b6c724ad253d800461012
2013-01-14 17:32:26 -08:00
John Koleszar
24bc1a7189 Use INT64_MAX instead of LLONG_MAX
These variables have the type int64_t, not long long. long long could
be a larger type than 64 bits. Emulate INT64_MAX for older versions of
MSVC, and remove the unreferenced vpx_ports/vpxtypes.h

Change-Id: Ideaca71838fcd3849d816d5ab17aa347c97d03b0
2013-01-14 15:57:21 -08:00
Ronald S. Bultje
c9071601a2 Remove compound intra-intra experiment.
This experiment gives little gains and adds relatively much code
complexity (and it hinders other experiments), so let's get rid of
it.

Change-Id: Id25e79a137a1b8a01138aa27a1fa0ba4a2df274a
2013-01-14 15:47:25 -08:00
Yaowu Xu
741fbe9656 Merge experiment "subpelrefmv"
Change-Id: Iac7f3d108863552b850c92c727e00c95571c9e96
2013-01-14 15:18:47 -08:00
Yaowu Xu
f7dab60096 Merge experiment "widerlpf"
Change-Id: I0c94475075e66e13cfe4c20fab7db6474441ae86
2013-01-14 15:17:35 -08:00
Yaowu Xu
d8c5bceee5 Merge "changed UV plane loop filtering for TX_8X8" into experimental 2013-01-14 14:47:31 -08:00
Yaowu Xu
8750414368 Merge "change to evaluate reference mvs using above only" into experimental 2013-01-14 14:40:38 -08:00
Yaowu Xu
ad9a16ed17 changed UV plane loop filtering for TX_8X8
In commit 9a1d73d, loop filtering was added for UV 4x4 boundaries
when TX_8X8 is used by a MB. This commit further refined the decision
to be based on the actual transform used for the UV planes. When
UV planes use 4x4 transform, i.e. when prediction mode used is either
I8X8_PRED or SPLITMV, UV planes are filtered on 4x4 boundaries, and no
filtering is applied on 4x4 block boundaries when UV planes use 8X8
transform.

Change-Id: Ibb404face0a1d129b4b4abaf67c55d82e8df8bec
2013-01-14 14:28:20 -08:00
Paul Wilkins
e2c696a7aa Merge "Fix compiler warnings" into experimental 2013-01-14 14:20:57 -08:00
Adrian Grange
c7576f97ff Merge "Merge prediction filter" into experimental 2013-01-14 14:18:21 -08:00
Yaowu Xu
fdf8654189 change to evaluate reference mvs using above only
Change-Id: Ibcc342efac0a9be7a21d9b2c09984d9e16bbb225
2013-01-14 14:01:40 -08:00
Yaowu Xu
113005b11d Fix compiler warnings
The warnings caused verify failure with gerrit for several  commits

Change-Id: I030df8638bd69b8783a3ac58e720ff9f0bfd546c
2013-01-14 13:56:52 -08:00
Adrian Grange
7bcaac3e64 Merge prediction filter
Removed the experimental flag from around the prediction filter.

Change-Id: Ic1dd2db8fe8ac17ed5129f83094d4c5cdd5527d2
2013-01-14 12:57:07 -08:00
Ronald S. Bultje
290b83ab62 Reset x->skip for each iteration in the RD loop.
This prevents ill-defined behaviour, such as setting x->skip for a mode
that is excluded because of frame-level flags (e.g. filter selection,
compound prediction selection), then not breaking out of the RD loop
because the mode is not allowed, but keeping the flag on. Whatever mode
is iterated through next in the RD loop will then carry this flag, and
all sort of bad stuff happens, such as x->skip being set on intra pred
modes.

Change-Id: I5bec46b36e38292174acb1c564b3caf00a9b4b9a
2013-01-14 12:44:32 -08:00
John Koleszar
76ac5b3937 Fix unused variable warnings
Previous commit does not build cleanly on Jenkins with the DWT/DCT
hybrid experiment enabled (--enable-dwtdcthybrid).

Change-Id: Ia67e8f59d17ef2d5200ec6b90dfe6711ed6835a5
2013-01-14 12:12:43 -08:00
Deb Mukherjee
516db21c2c Further enhancements/fixes on dct/dwt hybrid txfm
Fixes some scaling issues. Adds an option to only compute the
dct on the low-low subband for 32x32 and 64x64 blocks using
only a single 16x16 dct after 1 and 2 wavelet decomposition
levels respectively. Also adds an option to use a 8x8 dct
as building block.

Currenlty with the 2/6 filter and with a single 16x16 dct on
the low low band, the reuslts compared to full 32x32 dct is
as follows:
derf: -0.15%
yt: -0.29%
std-hd: -0.18%
hd: -0.6%
These are my current recommended settings, since the 2/6 filter
is very simple.

Results with 8x8 dct are about 0.3% worse.

Change-Id: I00100cdc96e32deced591985785ef0d06f325e44
2013-01-12 16:00:53 -08:00
Jim Bankoski
e42b280e11 Merge "WIP: Added sse2 version of vp9_mb_lpf_horizontal_edge_w" into experimental 2013-01-11 17:15:41 -08:00
Scott LaVarnway
b20ce07d76 WIP: Added sse2 version of vp9_mb_lpf_horizontal_edge_w
and vp9_mb_lpf_vertical_edge_w_sse2.  This was quickly done so we can
run some tests over the weekend.  Future commits will optimize/refactor these
functions further.

The decoder performance improved by ~17% for the clip used.

Change-Id: I612687cd5a7670ee840a0cbc3c68dc2b84d4af76
2013-01-11 17:11:04 -08:00
Jim Bankoski
385bea686b Merge "Upstream changes from Chromium Android Clang build." into experimental 2013-01-11 17:06:26 -08:00
Yaowu Xu
bbe1c9257f Merge "Add loop filtering for UV plane" into experimental 2013-01-11 16:56:39 -08:00
Yaowu Xu
9a1d73d036 Add loop filtering for UV plane
On block boundary within a MB when 8x8 block boundary only is filtered
for Y.

Change-Id: Ie1c804c877d199e78e2fecd8c2d3f1e114ce9ec1
2013-01-11 16:32:06 -08:00
Frank Galligan
bc45f23192 Upstream changes from Chromium Android Clang build.
See https://codereview.chromium.org/11875006/

Change-Id: Ied2a17df2b3222635f84aef120eaa9feb53750d2
2013-01-11 15:37:23 -08:00
Scott LaVarnway
9dc69dfb70 Merge "Initial sse2 version of the wide loopfilters" into experimental 2013-01-11 15:34:26 -08:00
Scott LaVarnway
4987c0f07e Initial sse2 version of the wide loopfilters
Updated the rtcd_defs and used the sse2 uv version
of the loopfilter.  The performance improved by ~8%
for the test clip used.

Change-Id: I5a0bca3b6674198d40ca4a77b8cc722ddde79c36
2013-01-11 14:54:14 -08:00
Paul Wilkins
d27ae620bc Remove INT64_MAX references.
Replace INT64_MAX references with LLONG_MAX
for windows build.

Change-Id: Ib8b45c1e9c15c043b2f54c27ed83b8682b2be34f
2013-01-11 19:45:26 +00:00
Yaowu Xu
d5a8b62d06 Merge "Reduce the usage of widerlpf" into experimental 2013-01-11 11:15:43 -08:00
Jim Bankoski
9431536045 rtcd for new wider loop filters
Change-Id: I8826bcdcf72ba6d86bde31cd13902a710399805c
2013-01-11 09:45:45 -08:00
Yaowu Xu
6c9fb22e13 Reduce the usage of widerlpf
The commit changed to not to use wider lpf within a superblock when
32x32 transform is used for the block.

The commit also changed to use the shorter version of loop filtering:
for UV planes.

Change-Id: I344c1fb9a3be9d1200782a788bcb0b001fedcff8
2013-01-10 20:15:47 -08:00
Ronald S. Bultje
aa2effa954 Merge tx32x32 experiment.
Change-Id: I615651e4c7b09e576a341ad425cf80c393637833
2013-01-10 08:23:59 -08:00
Ronald S. Bultje
460501fe84 Merge "Merge superblocks64 experiment." into experimental 2013-01-10 08:18:33 -08:00
Ronald S. Bultje
6884a83f06 Merge superblocks64 experiment.
Change-Id: If6c88752dffdb566f8d4322f135145270716fb8e
2013-01-09 17:21:40 -08:00
Yaowu Xu
51bae955e6 experiment a wider loop filter for MB border
when larger transforms are used

Change-Id: I25251442b44bf251df4c25a1c1fcf71fb2ad913b
2013-01-09 16:39:05 -08:00
Adrian Grange
7d6b5425d7 New prediction filter
This patch removes the old pred-filter experiment and replaces it
with one that is implemented using the switchable filter framework.

If the pred-filter experiment is enabled, three interopolation
filters are tested during mode selection; the standard 8-tap
interpolation filter, a sharp 8-tap filter and a (new) 8-tap
smoothing filter.

The 6-tap filter code has been preserved for now and if the
enable-6tap experiment is enabled (in addition to the pred-filter
experiment) the original 6-tap filter replaces the new 8-tap smooth
filter in the switchable mode.

The new experiment applies the prediction filter in cases of a
fractional-pel motion vector. Future patches will apply the filter
where the mv is pel-aligned and also to intra predicted blocks.

Change-Id: I08e8cba978f2bbf3019f8413f376b8e2cd85eba4
2013-01-09 12:00:39 -08:00
Deb Mukherjee
4b7304ee68 Adds 64x64 hybrid dct/dwt transform
This is to add to the 64x64 transform experiment as an alternative to
a 64x64 DCT.
Two levels of wavelet decomposition is used on a 64x64 block, followed
by 16x16 DCT on the four lowest subbands. The highest three subbands
are left untransformed after the first level DWT.

Change-Id: I3d48d5800468d655191933894df6b46e15adca56
2013-01-08 14:05:58 -08:00
Ronald S. Bultje
cd0f36b24f Merge "Merge superblocks (32x32) experiment." into experimental 2013-01-08 13:31:37 -08:00
Yunqing Wang
f1c56a8c8c Merge "vp9_sub_pixel_variance16x2 SSE2 optimization" into experimental 2013-01-08 12:59:08 -08:00
Ronald S. Bultje
4455036cfc Merge superblocks (32x32) experiment.
Change-Id: I0df99742029834a85c4933652b0587cf5b6b2587
2013-01-08 12:54:45 -08:00
Yunqing Wang
8d568312a2 vp9_sub_pixel_variance16x2 SSE2 optimization
About 5% decoder speedup.

Change-Id: Ib6687d337af758a536a0e7e289f400990f1f9794
2013-01-08 12:01:55 -08:00
John Koleszar
879cb7d962 Merge vp9-preview changes into experimental branch
Incorportate vp9-preview changes by merging master branch into experimental.

Conflicts:
	test/test.mk
	vp9/common/vp9_filter.c
	vp9/common/vp9_idctllm.c
	vp9/common/vp9_invtrans.h
	vp9/common/vp9_mbpitch.c
	vp9/common/vp9_rtcd_defs.sh
	vp9/common/vp9_systemdependent.h
	vp9/common/vp9_type_aliases.h
	vp9/common/x86/vp9_asm_stubs.c
	vp9/common/x86/vp9_subpixel_mmx.asm
	vp9/decoder/vp9_decodframe.c
	vp9/decoder/vp9_dequantize.c
	vp9/decoder/vp9_dequantize.h
	vp9/decoder/vp9_onyxd_int.h
	vp9/encoder/vp9_bitstream.c
	vp9/encoder/vp9_encodeframe.c
	vp9/encoder/vp9_rdopt.c

Change-Id: I17f51c3666d1b59cf1a699f87607cbc5d30a87c5
2013-01-08 10:19:59 -08:00
Yaowu Xu
c14439c3d3 reset segement map on key frame
This is to fix a decoder crash when decoder skips a number of frame to
continue decoding from a later key frame.

Change-Id: I3ba116eba6c3440e0528a21f53745f694302e4ad
2013-01-08 08:54:45 -08:00
Yaowu Xu
08e207ad04 Merge "minor loop filter refactoring and cleanup" into experimental 2013-01-08 08:40:03 -08:00
Yaowu Xu
d278d01836 minor loop filter refactoring and cleanup
This commit did a couple of minor cleanup/refactoring to prepare for
futher loop filter experiments. It merged y_only version of loop filter
function into the regular one, which makes sure that same logic is used
for functions for picking level and for actual loop filtering.

Change-Id: Id10c94dccd45f58e5310bacfdf6ee63cbb60b86f
2013-01-07 16:23:58 -08:00
Ronald S. Bultje
3ed14846e1 Remove a few redundant function arguments in encodeframe.c.
Also reindent a block of code that was misindented after addition of
the tx32x32 experiment.

Change-Id: Ic3e4aae3effd8a40136da68c9f382af03632ba08
2013-01-07 11:41:49 -08:00
Ronald S. Bultje
c13d9fef42 Re-enable support for static_threshold (encode_breakout).
Change-Id: Ibd7380f478d3127f9db91d0a4fd2fd0dfde961ab
2013-01-07 11:02:14 -08:00
Ronald S. Bultje
e6216d163a Don't use tx32x32 for macroblocks.
Change-Id: Ib674e0153ca360867ab7a20ba291ac9171a01250
2013-01-07 09:40:19 -08:00
Ronald S. Bultje
c3941665e9 64x64 blocksize support.
3.2% gains on std/hd, 1.0% gains on hd.

Change-Id: I481d5df23d8a4fc650a5bcba956554490b2bd200
2013-01-05 18:20:25 -08:00
Adrian Grange
81d1171fd4 Fix mode selection infinite loop bug
Mode selection for SBs could enter an infinite loop because
the interpolation filter mode index was not being reset
correctly.

Change-Id: I4bbe726f29ef5b6836e94884067c46084713cc11
2013-01-04 09:00:47 -08:00
Paul Wilkins
c6ba3a3d85 Further change to mv reference search.
This experimental change reorders the search so
that all possible references that match the target
reference frame are tested first and these in order
of distance from the current block. These will usually
be the highest scoring candidates.

If we do not find enough good candidates this way
we try non matching cases. These will usually be lower
scoring candidates.

The change in order together with breakouts when
we have found enough candidates should reduce
the computational cost and especially reduce the number
of sort operations.

Quality Results:
Std Hd +0.228%, Hd +0.074%, YT +0.046%, derf +0.137%

This effect is probably due to the fact that more distant
weak candidates are now less likely to get "promoted" over
near candidates even if they are repeated.

Change-Id: Iec37e77d88a48ad0ee1f315b14327a95d63f81f6
2013-01-04 15:18:10 +00:00
Yaowu Xu
df7ce5a711 Merge "make cost_coeffs() and tokenize_b() consistent" into experimental 2013-01-03 09:57:07 -08:00
Yaowu Xu
818f5698fb Merge "Merge cost_coeffs_2x2() into cost_coeffs()" into experimental 2013-01-03 09:33:21 -08:00
Yaowu Xu
83664f457b make cost_coeffs() and tokenize_b() consistent
Change-Id: I7cdb5c32a1400f88ec36d08ea982e38b77731602
2013-01-03 09:31:47 -08:00
Adrian Grange
259b800832 New interpolation filter selection algorithm
Old Scheme:
When SWITCHABLE filter selection is enabled the encoder
evaluates the use of each interpolation filter type and
selects the best one to use at the MB level. A frame-
level flag can be set to force the use of a particular
filter type for all MBs in a frame if it is more efficient
to encode that way. The logic here involved a Q dependent
threshold that assumed that the second 8-tap filter was
a high-pass filter. However, this requires a trip around
the recode loop. If the frame-level flag indicates use
of a particular filter, the other filters are not
evaluated in the pick_mode loop.

New Scheme:
Each filter type is evaluated at the MB level and a record
of the best filter is kept, irrespective of what filter
is signaled at the frame-level. Once all MBs have been
encoded, a decision is made as to what frame-level mode
to set for the *next* frame. If one filter is used by 80%
or more of the MBs, then this filter is forced since it
is assumed that this will be more efficient if the
next frame has similar characteristics. i.e. there is a
one-frame lag between measuring the filter selection and
setting the frame-level mode to use.

Change-Id: I6a7e7ced8f27e120fafb99db2dc9c6293f8d20f7
2013-01-03 08:12:43 -08:00
Yaowu Xu
bd28510ef9 Merge cost_coeffs_2x2() into cost_coeffs()
Remove special case function cost_coeffs_2x2() and change function
cost_coeffs() to handle 2nd order haar block as it is handle all
other block types already.

Change-Id: I2aac6f81ee0ae9e03d6a8da4f8681d69b79ce41f
2013-01-03 08:00:00 -08:00
Yunqing Wang
37166d5c1e Merge "Switch the order of calculating 2-D inverse transform" into experimental 2013-01-02 11:45:27 -08:00
Yunqing Wang
e9c69ab102 Merge "Skip finding best ref_mvs when the mode is ZEROMV" into experimental 2013-01-02 11:45:19 -08:00
Paul Wilkins
cad4a91429 Change INT64_MAX to LLONG_MAX
This is needed to make the windows build work after
the removal of vp9_type_alisases.h.

Change-Id: I8addf38e9f3c8b864e0e30a8916a26e0264dd02c
2013-01-02 18:06:00 +00:00
Paul Wilkins
313d1100af Added update-able mv-ref probabilities.
Part of NEW_MVREF experiment.
Added update-able probabilities.

Change-Id: I5a4fcf4aaed1d0d1dac980f69d535639a3d59401
2013-01-02 14:22:11 +00:00
Yunqing Wang
0f4de1573a Skip finding best ref_mvs when the mode is ZEROMV
Read mode before calling vp9_find_best_ref_mvs(). If the mode is
ZEROMV, the best ref_mvs are not needed. Then, we can skip calling
vp9_find_best_ref_mvs().

Change-Id: I5baa3658dd3f1c7107211cbbbcf919b4584be2e2
2012-12-27 16:18:53 -08:00
Yunqing Wang
cc80247f16 Switch the order of calculating 2-D inverse transform
The 2-D inverse transform X = M1*Z*Transposed_M2 was calculated
in 2 steps from left to right:
1. Vertical transform: Y = M1*Z
2. Horizontal transform: X= Y*Transposed_M2
In SIMD, a transpose is needed in vertical transform.

Here, switched the calculation order to do it from right to left.
In this way, we could eliminate that transpose by writing the
intermediate results out to their transposed positions.

Change-Id: I34dfe5eb01292f6e363712420d99475e2e81e12c
2012-12-27 14:09:30 -08:00
John Koleszar
5ebe94f9f1 Build fixes to merge vp9-preview into master
Various fixups to resolve issues when building vp9-preview under the more stringent
checks placed on the experimental branch.

Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
2012-12-26 11:21:09 -08:00
Yunqing Wang
6ee08f3ccf Fix a warning
Fixed the warning: the size of array ‘intermediate_buffer’ can’t
be evaluated [-Wvla].

Change-Id: Ibcffd6969bd71cee0c10f7cf18960e58cd0bd915
2012-12-21 15:26:56 -08:00
Scott LaVarnway
89ac94f8fb Removed mmx versions of vp9_bilinear_predict filters
These filters will not work with VP9.

Change-Id: Ic26c77961084fcea6bfa97f4cd95afdea2282e85
2012-12-21 14:41:49 -08:00
John Koleszar
229273391f Merge "add emmintrin_compat.h for builds with gcc < 4" into vp9-preview 2012-12-21 14:21:50 -08:00
Jim Bankoski
ad64ca4494 fixed sizes of global arrays
Change-Id: Ibc077cf1c1da0c86063f88c6d3073c6876989119
2012-12-21 13:09:04 -08:00
John Koleszar
9a7023d2ad Fix MSVS build for removed vp9/common/vp9_onyxd.h
Change-Id: I75ad0b4ca5b53b5bf759cc26a484ec196d275279
2012-12-20 16:14:55 -08:00
James Zern
9dab3ce624 add emmintrin_compat.h for builds with gcc < 4
Change-Id: If7822e6fcd0d3568b934032322b19ba3e401df26
2012-12-20 14:56:13 -08:00
Jim Bankoski
1dffce7f96 add private to assembly files to insure proper chromebuild
Change-Id: I6e43ca73f35401a974ed8ee27738d4318f09fd37
2012-12-20 09:40:18 -08:00
Deb Mukherjee
08f0c7cc9c New previous coef context experiment
Adds an experiment to derive the previous context of a coefficient
not just from the previous coefficient in the scan order but from a
combination of several neighboring coefficients previously encountered
in scan order.  A precomputed table of neighbors for each location
for each scan type and block size is used. Currently 5 neighbors are
used.

Results are about 0.2% positive using a strategy where the max coef
magnitude from the 5 neigbors is used to derive the context.

Change-Id: Ie708b54d8e1898af742846ce2d1e2b0d89fd4ad5
2012-12-19 18:49:39 -08:00
Scott LaVarnway
a6b2070d1a Disabled x86inc style assembly functions.... part 2
Missed a file

Change-Id: I33179de6755bc9eda9ad906e4fec6902ace435a5
2012-12-19 14:13:25 -08:00
John Koleszar
05ec800ea4 Use boolcoder API instead of inlining
This patch changes the token packing to call the bool encoder API rather
than inlining it into the token packing function, and similarly removes
a special get_signed case from the detokenizer. This allows easier
experimentation with changing the bool coder as a whole.

Change-Id: I52c3625bbe4960b68cfb873b0e39ade0c82f9e91
2012-12-19 12:52:41 -08:00
Scott LaVarnway
08dabbcee1 Disabled x86inc style assembly functions
Temporary fix for 32-bit mac build errors.

Change-Id: I2038f033cac16ea796097d0edd0f1c3da03246d7
2012-12-19 11:53:43 -08:00
Ronald S. Bultje
4cca47b538 Use standard integer types for pixel values and coefficients.
For coefficients, use int16_t (instead of short); for pixel values in
16-bit intermediates, use uint16_t (instead of unsigned short); for all
others, use uint8_t (instead of unsigned char).

Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da
2012-12-18 15:31:19 -08:00
Yaowu Xu
b41c3583ac Merge "correct logic in cnvcontext experiment for tx32x32" into experimental 2012-12-18 14:23:39 -08:00
Yaowu Xu
c29fb02903 Merge "Problem of over smoothing with intra modes." into vp9-preview 2012-12-18 14:22:19 -08:00
Ronald S. Bultje
5cab8b7a18 Merge "Give 4x4 scan and coef_band tables a _4x4 suffix." into experimental 2012-12-18 14:17:46 -08:00
Ronald S. Bultje
58961c74ea Merge "Remove redundant "Prob" type (it's a duplicate of vp9_prob)." into experimental 2012-12-18 14:17:18 -08:00
Yaowu Xu
de269c8a62 correct logic in cnvcontext experiment for tx32x32
Change-Id: I004ded11983b7fda85793912ebc5c6f266dc5eb5
2012-12-18 13:53:17 -08:00
Yunqing Wang
779c5f28a8 Fix uninitialized warning
Fixed uninitialized warning for txfm_size.

Change-Id: I42b7e802c3e84825d49f34e632361502641b7cbf
2012-12-18 13:19:04 -08:00
Yunqing Wang
e8d610dda0 Fix a warning
Fixed the warning: the size of array ‘intermediate_buffer’ can’t
be evaluated [-Wvla].

Change-Id: Ibcffd6969bd71cee0c10f7cf18960e58cd0bd915
2012-12-18 12:09:46 -08:00
Ronald S. Bultje
8986eb5c26 Give 4x4 scan and coef_band tables a _4x4 suffix.
This matches the names of tables for all other transform sizes.

Change-Id: Ia7681b7f8d34c97c27b0eb0e34d490cd0f8d02c6
2012-12-18 10:49:10 -08:00
Ronald S. Bultje
ebb5f2f7bd Remove redundant "Prob" type (it's a duplicate of vp9_prob).
Change-Id: I9548891d7b8ff672a31579bcdce74e4cea529883
2012-12-18 10:38:12 -08:00
John Koleszar
1306ba7659 Remove vp9_type_aliases.h
Prefer the standard fixed-size integer typedefs.

Change-Id: Iad75582350669e49a8da3b7facb9c259e9514a5b
2012-12-17 11:32:37 -08:00
Yaowu Xu
0405cd8e9f fixed a warning
where variable is used without initialization

Change-Id: Ic6b52623802641060cad4a72271050aeaf20ad5c
2012-12-17 11:11:07 -08:00
Paul Wilkins
d8f5d1b257 Problem of over smoothing with intra modes.
In some cases intra modes in inter frames give
an over smoothed appearance. Especially with
noisy but flat content.

Also in some cases there were problems with key
frame sizing again with very flat but noisy content.

These are temporary changes to help alleviate the
visual problems but will almost certainly hurt metric
results especially at the very low data rate end.

Change-Id: I11549179a19277ffc283d9788bc70168f2a8bdc9
2012-12-17 11:54:17 +00:00
Yaowu Xu
6247b239bc reset segement map on key frame
This is to fix a decoder crash when decoder skips a number of frame to
continue decoding from a later key frame.

Change-Id: I3ba116eba6c3440e0528a21f53745f694302e4ad
2012-12-14 06:35:32 -08:00
Yaowu Xu
f8ff3e5d47 prevents redefine of INT64_MAX
MSVC 2012 (_MSC_VER=1600) introduced the definition, this commit
prevents the redefinition of the macro

Change-Id: I7de92e7e9e865a342f2bcc4b071f8d3c9b2a508c
2012-12-13 16:09:52 -08:00
Yaowu Xu
fd6f492604 remove floating point inverse transforms
Change-Id: I9c651bd7c161974bf5f929446361b00d85e57a3f
2012-12-13 16:02:25 -08:00
Yaowu Xu
2b9ec585d6 fixed an encoder/decoder mismatch
The mismatch was caused by an improper merge of cleanup code around
tokenize_b() and stuff_b() with TX32X32 experiment.

Change-Id: I225ae62f015983751f017386548d9c988c30664c
2012-12-13 15:33:21 -08:00
Yaowu Xu
c681887652 fixed build issue with round()
not defined in msvc

Change-Id: I8fe8462a0c2f636d8b43c0243832ca67578f3665
2012-12-13 15:15:56 -08:00
Deb Mukherjee
7fa3deb1f5 Build fixes with teh super blcoks and 32x32 expts
Change-Id: I3c751f8d57ac7d3b754476dc6ce144d162534e6d
2012-12-13 12:18:38 -08:00
Deb Mukherjee
9c318ee371 Merge "Further improvements on the hybrid dwt/dct expt" into experimental 2012-12-13 11:04:56 -08:00
Deb Mukherjee
210dc5b2db Further improvements on the hybrid dwt/dct expt
Modifies the scanning pattern and uses a floating point 16x16
dct implementation for now to handle scaling better.
Also experiments are in progress with 2/6 and 9/7 wavelets.

Results have improved to within ~0.25% of 32x32 dct for std-hd
and about 0.03% for derf. This difference can probably be bridged by
re-optimizing the entropy stats for these transforms. Currently
the stats used are common between 32x32 dct and dwt/dct.

Experiments are in progress with various scan pattern - wavelet
combinations.

Ideally the subbands should be tokenized separately, and an
experiment will be condcuted next on that.

Change-Id: Ia9cbfc2d63cb7a47e562b2cd9341caf962bcc110
2012-12-13 10:37:49 -08:00
Ronald S. Bultje
f4608e3606 Merge "New default coefficient/band probabilities." into experimental 2012-12-13 09:56:50 -08:00
Ronald S. Bultje
5a5df19de3 New default coefficient/band probabilities.
Gives 0.5-0.6% improvement on derf and stdhd, and 1.1% on hd. The
old tables basically derive from times that we had only 4x4 or
only 4x4 and 8x8 DCTs.

Note that some values are filled with 128, because e.g. ADST ever
only occurs as Y-with-DC, as does 32x32; 16x16 ever only occurs
as Y-with-DC or as UV (as complement of 32x32 Y); and 8x8 Y2 ever
only has 4 coefficients max. If preferred, I can add values of
other tables in their place (e.g. use 4x4 2nd order high-frequency
probabilities for 8x8 2nd order), so that they make at least some
sense if we ever implement a larger 2nd order transform for the
8x8 DCT (etc.), please let me know

Change-Id: I917db356f2aff8865f528eb873c56ef43aa5ce22
2012-12-12 16:23:57 -08:00
Scott LaVarnway
b575394e21 Improved vp9_ihtllm_c
As suggested by Yaowu, we can use eob to reduce the complexity
of the vp9_ihtllm_c function.  For the 1080p test clip used, the decoder
performance improved by 17%.

Change-Id: I32486f2f06f9b8f60467d2a574209aa3a3daa435
2012-12-12 15:49:39 -08:00
Ronald S. Bultje
39de1e14ed Merge "Consistently use get_prob(), clip_prob() and newly added clip_pixel()." into experimental 2012-12-12 10:34:14 -08:00
Ronald S. Bultje
4d0ec7aacd Consistently use get_prob(), clip_prob() and newly added clip_pixel().
Add a function clip_pixel() to clip a pixel value to the [0,255] range
of allowed values, and use this where-ever appropriate (e.g. prediction,
reconstruction). Likewise, consistently use the recently added function
clip_prob(), which calculates a binary probability in the [1,255] range.
If possible, try to use get_prob() or its sister get_binary_prob() to
calculate binary probabilities, for consistency.

Since in some places, this means that binary probability calculations
are changed (we use {255,256}*count0/(total) in a range of places,
and all of these are now changed to use 256*count0+(total>>1)/total),
this changes the encoding result, so this patch warrants some extensive
testing.

Change-Id: Ibeeff8d886496839b8e0c0ace9ccc552351f7628
2012-12-12 10:01:19 -08:00
Yaowu Xu
0c35b27689 Merge "clean up tokenize_b() and stuff_b()" into experimental 2012-12-11 13:51:56 -08:00
Yaowu Xu
899f0fc126 clean up tokenize_b() and stuff_b()
Change-Id: I0c1be01aae933243311ad321b6c456adaec1a0f5
2012-12-11 13:32:16 -08:00