Commit Graph

171 Commits

Author SHA1 Message Date
Scott LaVarnway
0de458f6b9 Reduced the size of MB_MODE_INFO
Moved partition_bmi and partition_count out of MB_MODE_INFO and
placed into MACROBLOCK.  Also reduced the size of other members
of the MB_MODE_INFO struct.  For 1080p, the memory was reduced
by 1,209,516 bytes.  The decoder performance appeared to improve
by 3% for the clip used.
Note:  The main goal for this change is to improve the decoder
performance.  The encoder will be revisited at a later date for
further structure cleanup.

Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
2010-09-03 16:43:23 -04:00
John Koleszar
4496db45e3 Whitespace: nuke CRLFs
Change-Id: I8b9fdf9875a8fcff4cb49a3357ce44f18108c2e7
2010-09-02 13:33:01 -04:00
Yaowu Xu
fca129203a added separate rounding/zbin constants for 2nd order
This allows experiments of using different rounding and
zerobin constants for 2nd order blocks.

Change-Id: Idd829adba3edd1f713c66151a8d29bb245e33a71
2010-09-02 10:27:03 -04:00
Paul Wilkins
c239a1b67c Improved Force Key Frame Behaviour
These changes improve the behaviour of the code with
forced key frames sent in by a calling application.

The sizing of the frames is still suboptimal for two pass in
particular but the behaviour is much better than it was.

Change-Id: I35fae610c67688ccc69d11f385e87dfc884e65a1
2010-08-31 14:32:40 -04:00
Scott LaVarnway
e85e631504 Changed above and left context data layout
The main reason for the change was to reduce cycles in the token
decoder. (~1.5% gain for 32 bit)  This layout should be more
cache friendly.

As a result of this change, the encoder had to be updated.

Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837
Note: dixie uses a similar layout
2010-08-31 11:24:30 -04:00
John Koleszar
8e7ebacb19 increase rate control buffer level precision
The external API exposes the RC initial/optimal/full buffer level in
milliseconds, but this value was truncated internally to seconds. This
patch allows the use of the full precision during the conversion from
time to bits.

Change-Id: If8dd2a87614c05747f81432cbe75dd9e6ed2f04e
2010-08-20 11:04:48 -04:00
John Koleszar
80d3923a78 move segmentation_common to encoder
vp8_update_gf_useage_maps() is only used by the encoder. This patch
fixes the ability to build in decode-only or encode-only
configurations.

Change-Id: I3a5211428e539886ba998e09e8abd747ac55c9aa
2010-08-13 14:54:24 -04:00
Scott LaVarnway
9c7a0090e0 Removed unnecessary MB_MODE_INFO copies
These copies occurred for each macroblock in the encoder and decoder.
Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD.  As a result,
a large number compile errors had to be fixed.

Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3
2010-08-12 16:25:43 -04:00
John Koleszar
d22e2968a8 cosmetics: add missing 2D array braces
Silences compile warning.

Change-Id: I4b207d97f8570fe29aa2710e4ce4f02e7e43b57a
2010-08-11 13:55:38 -04:00
John Koleszar
392a958274 avoid negative array subscript warnings
The mv_ref and sub_mv_ref token encodings are indexed from NEARESTMV
and LEFT4X4, respectively, rather than being zero-based like the
other token encodings.

Change-Id: I3699c3f84111209ecfb91097c4b900773e9a3ad5
2010-08-11 13:49:12 -04:00
Scott LaVarnway
99f46d62d9 Moved gf_active code to encoder only
The gf_active code is only used by the encoder, so it was moved from
common and decoder.

Change-Id: Iada15acd5b2b33ff70c34668ca87d4cfd0d05025
2010-08-11 11:54:25 -04:00
Yaowu Xu
c404fa42ac Removed duplicate functions
Change-Id: Ie587972ccefd3c762b8cdf8ef39345cd22924b9b
2010-08-10 21:45:34 -07:00
Yaowu Xu
3b95a46c55 Normalize quantizer's zero bin and rounding factors
This patch changes a few numbers in the two constant arrays
for quantizer's zerobin and rounding factors, in general to
make the sum of the two factors for any Q to be 128.  While
it might be beneficial to calibrate the two arrays for best
quantizer performance, it is not the purpose of this patch.
Normalizing the two arrays will enable quick optimization
of the current faster quantizer, i.e .zerobin check can be
removed.

Change-Id: If9abfd7929bf4b8e9ecd64a79d817c6728c820bd
2010-08-10 21:12:04 -07:00
Timothy B. Terriberry
8fa38096a3 Add trellis quantization.
Replace the exponential search for optimal rounding during
 quantization with a linear Viterbi trellis and enable it
 by default when using --best.
Right now this operates on top of the output of the adaptive
 zero-bin quantizer in vp8_regular_quantize_b() and gives a small
 gain.
It can be tested as a replacement for that quantizer by
 enabling the call to vp8_strict_quantize_b(), which uses
 normal rounding and no zero bin offset.
Ultimately, the quantizer will have to become a function of lambda
 in order to take advantage of activity masking, since there is
 limited ability to change the quantization factor itself.
However, currently vp8_strict_quantize_b() plus the trellis
 quantizer (which is lambda-dependent) loses to
 vp8_regular_quantize_b() alone (which is not) on my test clip.

Patch Set 3:

Fix an issue related to the cost evaluation of successor
states when a coefficient is reduced to zero. With this
issue fixed, now the trellis search almost exactly matches
the exponential search.

Patch Set 2:

Overall, the goal of this patch set is to make "trellis"
search to produce encodings that match the exponential
search version. There are three main differences between
Patch Set 2 and 1:
a. Patch set 1 did not properly account for the scale of
2nd order error, so patch set 2 disable it all together
for 2nd blocks.
b. Patch set 1 was not consistent on when to enable the
the quantization optimization. Patch set 2 restore the
condition to be consistent.
c. Patch set 1 checks quantized level L-1, and L for any
input coefficient was quantized to L. Patch set 2 limits
the candidate coefficient to those that were rounded up
to L. It is worth noting here that a strategy to check
L and L+1 for coefficients that were truncated down to L
might work.

(a and b get trellis quant to basically match the exponential
search on all mid/low rate encodings on cif set, without
a, b, trellis quant can hurt the psnr by 0.2 to .3db at
200kbps for some cif clips)
(c gets trellis quant  to match the exponential search
to match at Q0 encoding, without c, trellis quant can be
1.5 to 2db lower for encodings with fixed Q at 0 on most
derf cif clips)

Change-Id:	Ib1a043b665d75fbf00cb0257b7c18e90eebab95e
2010-08-10 20:58:24 -07:00
Jan Kratochvil
0327d3df90 nasm: end labels with colon (':')
Labels should end by colon (':'), nasm requires it.

Provide nasm compatibility.  No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: I0b2ec6f01afb061d92841887affb5ca0084f936f
2010-08-02 09:20:03 -04:00
Jan Kratochvil
c8134bc54a nasm: use OWORD vs DQWORD
nasm knows only OWORD.  yasm knows both OWORD and DQWORD.

Provide nasm compatibility.  No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu.  Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.

Change-Id: I62151390089e90df9a7667822fa594ac20b00e78
2010-08-02 09:17:14 -04:00
Yaowu Xu
f95c80b60f Enable the switch between two versions of quantizer
To facilitate more testing related to quantizer and rate
control, the old version quantizer is added back. old and
new quantizer can be switched back and forth by define or
un-define the macro "EXACT_QUANT".

Change-Id: Ia77e687622421550f10e9d65a9884128a79a65ff
2010-07-28 10:51:34 -07:00
Johann
a570bbd418 x86/sse2: disable asm quantizer
follow up to Change I0e51492d: neon: disable asm quantizer

Now x86 doesn't segfault with --disable-runtime-cpu-detect and -p=2

Change-Id: I8ca127bb299198efebbcbd5a661e81788361933f
2010-07-27 12:54:43 -04:00
John Koleszar
d8009c077a neon: disable asm quantizer
The assembly version of the quantizer has not been updated to match
the new exact quantizer introduced in commit e04e2935. That commit tried
to disable this code but missed the non-RTCD case.

Thanks to David Baker <david.baker at openmarket.com> for isolating the
issue and testing this fix.

Change-Id: I0e51492dc6f8e44d2c10b587427448bf94135c65
2010-07-27 11:16:19 -04:00
Fritz Koenig
0ce3901282 Swap alt/gold/new/last frame buffer ptrs instead of copying.
At the end of the decode, frame buffers were being copied.
The frames are not updated after the copy, they are just
for reference on later frames.  This change allows multiple
references to the same frame buffer instead of copying it.

Changes needed to be made to the encoder to handle this.  The
encoder is still doing frame buffer copies in similar places
where pointer reference could be done.

Change-Id: I7c38be4d23979cc49b5f17241ca3a78703803e66
2010-07-23 14:53:59 -04:00
Paul Wilkins
68cf24310b Merge commit 'refs/changes/51/351/1' of ssh://review.webmproject.org:29418/libvpx into KfRateBugMerged 2010-07-23 17:45:26 +01:00
Yaowu Xu
f5cf8553a2 Merge "Make the quantizer exact." 2010-07-23 09:26:26 -07:00
Paul Wilkins
9404c7db6d Rate control bug with long key frame interval.
In two pass encodes, the calculation of the number of bits
allocated to a KF group had the potential to overflow for high data
rates if the interval is very long.

We observed the problem in one test clip where there was one
section where there was an 8000 frame gap between key frames.

Change-Id: Ic48eb86271775d7573b4afd166b567b64f25b787
2010-07-23 17:01:12 +01:00
Timothy B. Terriberry
e04e293522 Make the quantizer exact.
This replaces the approximate division-by-multiplication in the
 quantizer with an exact one that costs just one add and one
 shift extra.
The asm versions have not been updated in this patch, and thus
 have been disabled, since the new method requires different
 multipliers which are not compatible with the old method.

Change-Id: I53ac887af0f969d906e464c88b1f4be69c6b1206
2010-07-23 08:48:01 -07:00
Paul Wilkins
d576690ba1 80 character line length on Arnr LUT
Tweaked table to fit to 80 characters.

Change-Id: Ie6ba80e0b31b33e23d2bf78599abe223369fcefb
2010-07-23 16:47:54 +01:00
Yaowu Xu
7a89d4c3d4 Merge "Improve the accuracy of forward walsh-hadamard transform" 2010-07-19 07:50:26 -07:00
Paul Wilkins
0ba32632cd ARNR Lookup Table.
Change submitted for Adrian Grange. Convert threshold
calculation in ARNR filter to a lookup table.

Change-Id: I12a4bbb96b9ce6231ce2a6ecc2d295610d49e7ec
2010-07-19 14:46:42 +01:00
Paul Wilkins
bf18069ceb Rate control fix for ARNR filtered frames.
Previously we had assumed that it was necessary to give a full frame's
bit allocation to the alt ref frame if it has been created through temporal
filtering. This is not the case. The active max quantizer control
insures that sufficient bits are allocated if needed and allocating a
full frame's worth of bits creates an excessive overhead for the ARF.

Change-Id: I83c95ed7bc7ce0e53ccae6ff32db5a97f145937a
2010-07-19 14:10:07 +01:00
Paul Wilkins
7c938f4d3c Fix: Incorrect 'cols' calculation in temporal filter.
Change-Id: I37f10fbe4fbb505c1d34980a59af3e817c287e22
2010-07-16 15:57:17 +01:00
Yaowu Xu
3d0a1edadd Fix a compiling error on armv6
The issue was caused by a bad merge in Change I5559d1e8

Change-Id: I6563f652bc1500202de361f8f51d11cc6ddf3331
2010-07-07 14:45:13 -04:00
Adrian Grange
0618ff14d6 Fix bug in 1st pass motion compensation
In the case where the best reference mv is not (0,0) a secondary
search is carried out centered on (0,0). However, rather than
sending tmp_err into the search function, motion_error was
inadvertently passed.

As a result tmp_err remains set at INT_MAX and the (0,0)-based
search result will never be selected, even if it is better.

Change-Id: I3c82b246c8c82ba887b9d3fb4c9e0a0f2fe5a76c
2010-07-01 14:19:43 +01:00
Paul Wilkins
2e3d8d3263 Merge "Further adjustment of RD behaviour with Q and Zbin." 2010-07-01 01:53:40 -07:00
Paul Wilkins
1ca39bf26d Further adjustment of RD behaviour with Q and Zbin.
Following conversations with Tim T (Derf) I ran a large number of
tests comparing the existing polynomial expression with a simpler
^2 variant. Though the polynomial was sometimes a little better at
the extremes of Q it was possible to get close for most clips and
even a little better on some.

This code also changes the way the RD multiplier is calculated
when the ZBIN is extended to use a variant of the same ^2
expression.

I hope that this simpler expression will be easier to tune further
as we expand our test set and consider adjustments based on content.

Change-Id: I73b2564346e74d1332c33e2c1964ae093437456c
2010-06-29 12:15:54 +01:00
Yaowu Xu
b62d093efa Improve the accuracy of forward walsh-hadamard transform
Besides the slight improvement in round trip error. This
also fixes a sign bias in the forward transform, so the
round trip errors are evenly distributed between +1s and
-1s. The old bias seemed to work well with the dc sign bias
in old fdct,  which no longer exist in the improved fdct.

Change-Id: I8635e7be16c69e69a8669eca5438550d23089cef
2010-06-28 22:10:48 -07:00
Adrian Grange
aa8fe0d269 Fixed buffer selection for UV in AltRef filtering
Corrected setting of "which_buffer" for U & V cases to match that
used for Y, i.e. to refer to the temporally most recent frame of
those to be filtered.

Change-Id: Idf94b287ef47a05f060da3e61134a0b616adcb6b
2010-06-28 16:45:06 +01:00
Scott LaVarnway
f1a3b1e0d9 Added first-pass sse2 version of Yaowu's new fdct.
Change-Id: Ib479210067510162879c368428b92690591120b2
2010-06-24 16:40:56 -04:00
Yaowu Xu
d0dd01b8ce Redo the forward 4x4 dct
The new fdct lowers the round trip sum squared error for a
4x4 block ~0.12. or ~0.008/pixel. For reference, the old
matrix multiply version has average round trip error 1.46
for a 4x4 block.

Thanks to "derf" for his suggestions and references.

Change-Id: I5559d1e81d333b319404ab16b336b739f87afc79
2010-06-24 13:17:58 -07:00
Fritz Koenig
a5906668a3 vp8cx : bestsad declared and initialized incorrectly.
bestsad needs to be a int and set to INT_MAX because at the end
of the function it is compared to INT_MAX to determine if there
was a match in the function.

Change-Id: Ie80e88e4c4bb4a1ff9446079b794d14d5a219788
2010-06-24 14:30:48 -04:00
Fritz Koenig
cecdd73db7 vp8cx : bestsad declared and initialized incorrectly.
bestsad should be an int initialized to INT_MAX.  The optimized
SAD function expects a signed value for bestsad to use for comparison
and early loop termination.  When no match is made, which is
determined by a comparison of bestsad to INT_MAX, INT_MAX is returned.
2010-06-24 12:18:23 -04:00
agrange
a08df4552a Fix breakout thresh computation for golden & AltRef frames
1. Unavailability of each reference frame type should be tested
independently,
2. Also, only the VP8_GOLD_FLAG needs to be tested before setting
golden frame specific thresholds, and only VP8_ALT_FLAG needs
testing before setting thresholds relevant to the AltRef frame.
(Raised by gbvalor, in response to Issue 47)

Change-Id: I6a06fc2a6592841d85422bc1661e33349bb6c3b8
2010-06-21 16:50:59 +01:00
agrange
daa5d0eb3d Changed unary operator from ! to ~
Since the intent is
to reset the appropriate bit in ref_frame_flags not to
test a logic condition. Prior result would always have
been ref_frame_flags being set to 0.
(Issue reported by dgohman, issue 47)

Change-Id: I2c12502ed74c73cf38e98c9680e0249c29e16433
2010-06-21 15:23:51 +01:00
agrange
d4b99b8e3a Moved DOUBLE_DIVIDE_CHECK to denominator (was on numerator)
The DOUBLE_DIVIDE_CHECK macro prevents from divide by 0,
so must be on the denominator to work as intended.

Change-Id: Ie109242d52dbb9a2c4bc1e11890fa51b5f87ffc7
2010-06-21 15:20:52 +01:00
Jim Bankoski
220daa00e0 vp8_block_error_xmm: remove unnecessary instructions
Remove a couple instructions from this function which weren't
necessary for correct execution.

Change-Id: Ib649674f140689f7e5c1530c35686241688a3151
2010-06-18 13:34:43 -04:00
John Koleszar
94c52e4da8 cosmetics: trim trailing whitespace
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.

Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
2010-06-18 13:06:11 -04:00
Guillermo Ballester Valor
5a72620de9 Fix compiler warnings
Change-Id: I2a97f08cc3c7808ce5be39e910cc5147ecf03a1d
2010-06-14 17:23:49 -04:00
Scott LaVarnway
48c84d138f sse2 version of vp8_regular_quantize_b
Added sse2 version of vp8_regular_quantize_b which improved encode
performance(for the clip used) by ~10% for 32 bit builds and ~3% for
64 bit builds.

Also updated SHADOW_ARGS_TO_STACK to allow for more than 9 arguments.

Change-Id: I62f78eabc8040b39f3ffdf21be175811e96b39af
2010-06-14 14:07:56 -04:00
John Koleszar
900d0548db Merge "Make this/next iiratio unsigned." 2010-06-13 14:35:21 -07:00
Paul Wilkins
5ef25a9728 Merge "Tuning of baseline Rd equation to improve behavior at the" 2010-06-13 04:01:46 -07:00
John Koleszar
cd475da8ed Make this/next iiratio unsigned.
This patch addresses issue #79, which is a regression since commit
28de670 "Fix RD bug." If the coded error value is zero, the iiratio
calculation effectively multiplies by 1000000 by the
DOUBLE_DIVIDE_CHECK macro. This can result in a value larger than
INT_MAX, giving a negative ratio. Since the error values are
conceptually unsigned (though they're stored in a double) this patch
makes the iiratio values unsigned, which allows the clamping to work
as expected.
2010-06-12 14:11:51 -04:00
John Koleszar
59c50966ac Enable vp8_sad16x16x4d_sse3 in non-RTCD case
Typo caused C version of 16x16x4 SAD to be called when built with
--disable-runtime-cpu-detect.

Change-Id: I0fe6fa67280b3a5f13acb3c8ed914f039aaaf316
2010-06-11 13:15:30 -04:00
Paul Wilkins
f6a58d620d Tuning of baseline Rd equation to improve behavior at the
low and high Q ends.
2010-06-11 15:10:51 +01:00
Paul Wilkins
10ae99c67b Merge "Adjust to avoid long line" 2010-06-10 03:24:54 -07:00
Paul Wilkins
a04ed23ff5 Adjust to avoid long line 2010-06-10 11:15:05 +01:00
Paul Wilkins
cd715faa50 Merge "Correct comment" 2010-06-10 03:05:32 -07:00
Paul Wilkins
ae244efb85 Merge "Fix RD bug." 2010-06-10 03:04:45 -07:00
Yaowu Xu
3225b893e8 minor cleanup of quantizer and fdct code
Change-Id: I7ccc580410bea096a70dce0cc3d455348d4287c5
2010-06-08 15:13:50 -07:00
Yaowu Xu
4bb895e854 fix a typo
Change-Id: I180a05ad57ee6164a6a169ee08e8affd09671eee
2010-06-08 09:37:01 -07:00
Paul Wilkins
6702a4047d Correct comment 2010-06-08 09:59:57 +01:00
Paul Wilkins
28de670cd9 Fix RD bug. 2010-06-07 17:34:46 +01:00
Yaowu Xu
854c007a77 Remove duplicate and unused functions
Change-Id: I944035e720ef834561a9da0d723879a4f787312c
2010-06-07 07:41:07 -07:00
John Koleszar
09202d8071 LICENSE: update with latest text
Change-Id: Ieebea089095d9073b3a94932791099f614ce120c
2010-06-04 16:19:40 -04:00
Yaowu Xu
a7bb3360bc Fix stats format and correct data size and bit rate output
Change-ID: I093abe6094589a0d73f6ca85b825678a19e68285
2010-05-27 19:56:18 -07:00
Paul Wilkins
57d59f6ee7 Merge "Correct bit allocation when the alternative reference frame" 2010-05-26 09:06:49 -07:00
Paul Wilkins
ea4b6f18cb Correct bit allocation when the alternative reference frame
is constructed from multiple source frames

Change-Id: I2e026c10d02b071b401c9fe8ab8dcfc0ac306103
2010-05-25 14:26:26 +01:00
John Koleszar
b7492341ac install includes in DIST_DIR/include/vpx, move vpx_codec/ to vpx/
This renames the vpx_codec/ directory to vpx/, to allow applications
to more consistently reference these includes with the vpx/ prefix.
This allows the includes to be installed in /usr/local/include/vpx
rather than polluting the system includes directory with an
excessive number of includes.

Change-Id: I7b0652a20543d93f38f421c60b0bbccde4d61b4f
2010-05-24 20:27:42 -04:00
John Koleszar
6be1d9337e Merge "Fixed an encoder debug/relese mismatch in x86_64-win64-vs8" 2010-05-24 11:07:13 -07:00
Yunqing Wang
ad6a9d4e50 Fixed minor bug for realtime-only building 2010-05-24 11:30:04 -04:00
Paul Wilkins
c012d63ec9 Fixed incorrect casts that broke rate control in some situations. 2010-05-20 16:49:39 +01:00
Yaowu Xu
c15652bce1 Fixed an encoder debug/relese mismatch in x86_64-win64-vs8
Visual c++ compiler uses xmm registers for floating point
operations for 64 bit architecture, therefore its calling
convention requires the preservation of xmm6-xmm15 in any
function that have used these registers. However, the sse2
functions, that were originally written for 32 bit windows,
may have used xmm6 and xmm7 without preserving the content.
In this particular case, the compiler used xmm6 to save
the variable "two_pass_min_rate", the value of the variable
is mucked up by our sse2 optimized loop filter functions,
hence the results of release/debug mismatching.
2010-05-19 15:48:00 -07:00
Pavol Rusnak
0fc9abfbfd remove unneeded variables 2010-05-19 21:15:32 +02:00
John Koleszar
0ea50ce9cb Initial WebM release 2010-05-18 11:58:33 -04:00