Commit Graph

3027 Commits

Author SHA1 Message Date
Adrian Grange
cf49034b14 Merge "Fixed buffer selection for UV in AltRef filtering" 2010-06-30 02:43:47 -07:00
Yunqing Wang
bead039d4d Improve SSE2 loopfilter functions
Restructured and rewrote SSE2 loopfilter functions. Combined u and
v into one function to take advantage of SSE2 128-bit registers.
Tests on test clips showed a 4% decoder performance improvement on
Linux desktop.

Change-Id: Iccc6669f09e17f2224da715f7547d6f93b0a4987
2010-06-29 15:23:14 -04:00
Paul Wilkins
1ca39bf26d Further adjustment of RD behaviour with Q and Zbin.
Following conversations with Tim T (Derf) I ran a large number of
tests comparing the existing polynomial expression with a simpler
^2 variant. Though the polynomial was sometimes a little better at
the extremes of Q it was possible to get close for most clips and
even a little better on some.

This code also changes the way the RD multiplier is calculated
when the ZBIN is extended to use a variant of the same ^2
expression.

I hope that this simpler expression will be easier to tune further
as we expand our test set and consider adjustments based on content.

Change-Id: I73b2564346e74d1332c33e2c1964ae093437456c
2010-06-29 12:15:54 +01:00
Yaowu Xu
b62d093efa Improve the accuracy of forward walsh-hadamard transform
Besides the slight improvement in round trip error. This
also fixes a sign bias in the forward transform, so the
round trip errors are evenly distributed between +1s and
-1s. The old bias seemed to work well with the dc sign bias
in old fdct,  which no longer exist in the improved fdct.

Change-Id: I8635e7be16c69e69a8669eca5438550d23089cef
2010-06-28 22:10:48 -07:00
Adrian Grange
aa8fe0d269 Fixed buffer selection for UV in AltRef filtering
Corrected setting of "which_buffer" for U & V cases to match that
used for Y, i.e. to refer to the temporally most recent frame of
those to be filtered.

Change-Id: Idf94b287ef47a05f060da3e61134a0b616adcb6b
2010-06-28 16:45:06 +01:00
Scott LaVarnway
f1a3b1e0d9 Added first-pass sse2 version of Yaowu's new fdct.
Change-Id: Ib479210067510162879c368428b92690591120b2
2010-06-24 16:40:56 -04:00
Yaowu Xu
d0dd01b8ce Redo the forward 4x4 dct
The new fdct lowers the round trip sum squared error for a
4x4 block ~0.12. or ~0.008/pixel. For reference, the old
matrix multiply version has average round trip error 1.46
for a 4x4 block.

Thanks to "derf" for his suggestions and references.

Change-Id: I5559d1e81d333b319404ab16b336b739f87afc79
2010-06-24 13:17:58 -07:00
Fritz Koenig
a5906668a3 vp8cx : bestsad declared and initialized incorrectly.
bestsad needs to be a int and set to INT_MAX because at the end
of the function it is compared to INT_MAX to determine if there
was a match in the function.

Change-Id: Ie80e88e4c4bb4a1ff9446079b794d14d5a219788
2010-06-24 14:30:48 -04:00
Fritz Koenig
cecdd73db7 vp8cx : bestsad declared and initialized incorrectly.
bestsad should be an int initialized to INT_MAX.  The optimized
SAD function expects a signed value for bestsad to use for comparison
and early loop termination.  When no match is made, which is
determined by a comparison of bestsad to INT_MAX, INT_MAX is returned.
2010-06-24 12:18:23 -04:00
John Koleszar
5e34461448 Remove INLINE/FORCEINLINE
These are mostly vestigial, it's up to the compiler to decide what
should be inlined, and this collided with certain Windows platform SDKs.

Change-Id: I80dd35de25eda7773156e355b5aef8f7e44e179b
2010-06-24 09:24:33 -04:00
agrange
a08df4552a Fix breakout thresh computation for golden & AltRef frames
1. Unavailability of each reference frame type should be tested
independently,
2. Also, only the VP8_GOLD_FLAG needs to be tested before setting
golden frame specific thresholds, and only VP8_ALT_FLAG needs
testing before setting thresholds relevant to the AltRef frame.
(Raised by gbvalor, in response to Issue 47)

Change-Id: I6a06fc2a6592841d85422bc1661e33349bb6c3b8
2010-06-21 16:50:59 +01:00
agrange
daa5d0eb3d Changed unary operator from ! to ~
Since the intent is
to reset the appropriate bit in ref_frame_flags not to
test a logic condition. Prior result would always have
been ref_frame_flags being set to 0.
(Issue reported by dgohman, issue 47)

Change-Id: I2c12502ed74c73cf38e98c9680e0249c29e16433
2010-06-21 15:23:51 +01:00
agrange
d4b99b8e3a Moved DOUBLE_DIVIDE_CHECK to denominator (was on numerator)
The DOUBLE_DIVIDE_CHECK macro prevents from divide by 0,
so must be on the denominator to work as intended.

Change-Id: Ie109242d52dbb9a2c4bc1e11890fa51b5f87ffc7
2010-06-21 15:20:52 +01:00
Timothy B. Terriberry
9f81463454 Fix a linker error on x86-64 Linux when not using a version script.
If the version script produced by the libvpx build system is not
 used when linking a shared library on x86-64 Linux, the constant
 data in the subpel filters produces R_X86_64_32 relocation errors
 due to the use of wrt rip addressing instead of
 wrt rip wrt ..gotpcrel.
Instead of adding a new macro for this addressing mode, this patch
 sets the ELF visibility of these symbols to "hidden", which
 allows wrt rip addressing to work without a text relocation.
This allows building a shared library without using the provided
 build system or a separate version script.
Fixes http://code.google.com/p/webm/issues/detail?id=46

Change-Id: Ie108f9d9a4352e5af46938bf4750d2302c1b2dc2
2010-06-21 08:19:12 -04:00
Jim Bankoski
220daa00e0 vp8_block_error_xmm: remove unnecessary instructions
Remove a couple instructions from this function which weren't
necessary for correct execution.

Change-Id: Ib649674f140689f7e5c1530c35686241688a3151
2010-06-18 13:34:43 -04:00
John Koleszar
94c52e4da8 cosmetics: trim trailing whitespace
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.

Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
2010-06-18 13:06:11 -04:00
John Koleszar
c65e8e8e46 Merge "Change bitreader to use a larger window." 2010-06-17 18:08:36 -07:00
Timothy B. Terriberry
c17b62e1bd Change bitreader to use a larger window.
Change bitreading functions to use a larger window which is refilled less
 often.

This makes it cheap enough to do bounds checking each time the window is
 refilled, which avoids the need to copy the input into a large circular
 buffer.
This uses less memory and speeds up the total decode time by 1.6% on an ARM11,
 2.8% on a Cortex A8, and 2.2% on x86-32, but less than 1% on x86-64.

Inlining vp8dx_bool_decoder_fill() has a big penalty on x86-32, as does moving
 the refill loop to the front of vp8dx_decode_bool().
However, having the refill loop between computation of the split values and
 the branch in vp8_decode_mb_tokens() is a big win on ARM (presumably due to
 memory latency and code size: refilling after normalization duplicates the
 code in the DECODE_AND_BRANCH_IF_ZERO and DECODE_AND_LOOP_IF_ZERO cases.
Unfortunately, refilling at the end of vp8dx_bool_decoder_fill() and at the
 beginning of each decode step in vp8_decode_mb_tokens() means the latter
 requires an extra refill at the end.
Platform-specific versions could avoid the problem, but would require most of
 detokenize.c to be duplicated.

Change-Id: I16c782a63376f2a15b78f8086d899b987204c1c7
2010-06-15 19:55:14 -07:00
Yunqing Wang
9fdfb8e928 Merge "More on "some XMM registers are non-volatile on windows x64 ABI"" 2010-06-15 06:41:54 -07:00
Yunqing Wang
397aad3ec2 More on "some XMM registers are non-volatile on windows x64 ABI"
Add same fix in subpixel_sse2.asm.

Change-Id: Icfda6103cbf74ec43308e96961dd738aa823c14d
2010-06-15 09:11:26 -04:00
John Koleszar
89c8b3dbc6 vp8_cx_iface: set default cpu used to 0
Change-Id: I7b35f4717cdd204224112f72471b551617262417
2010-06-14 17:28:15 -04:00
Guillermo Ballester Valor
5a72620de9 Fix compiler warnings
Change-Id: I2a97f08cc3c7808ce5be39e910cc5147ecf03a1d
2010-06-14 17:23:49 -04:00
Scott LaVarnway
48c84d138f sse2 version of vp8_regular_quantize_b
Added sse2 version of vp8_regular_quantize_b which improved encode
performance(for the clip used) by ~10% for 32 bit builds and ~3% for
64 bit builds.

Also updated SHADOW_ARGS_TO_STACK to allow for more than 9 arguments.

Change-Id: I62f78eabc8040b39f3ffdf21be175811e96b39af
2010-06-14 14:07:56 -04:00
Paul Wilkins
99c5745760 Merge "Use local pointer to pbi->common." 2010-06-14 09:55:02 -07:00
John Koleszar
900d0548db Merge "Make this/next iiratio unsigned." 2010-06-13 14:35:21 -07:00
Paul Wilkins
5ef25a9728 Merge "Tuning of baseline Rd equation to improve behavior at the" 2010-06-13 04:01:46 -07:00
Paul Wilkins
b99d89d0bf Merge "Incorrect comment." 2010-06-13 04:01:01 -07:00
John Koleszar
cd475da8ed Make this/next iiratio unsigned.
This patch addresses issue #79, which is a regression since commit
28de670 "Fix RD bug." If the coded error value is zero, the iiratio
calculation effectively multiplies by 1000000 by the
DOUBLE_DIVIDE_CHECK macro. This can result in a value larger than
INT_MAX, giving a negative ratio. Since the error values are
conceptually unsigned (though they're stored in a double) this patch
makes the iiratio values unsigned, which allows the clamping to work
as expected.
2010-06-12 14:11:51 -04:00
John Koleszar
00d566eae1 Merge "require --enable-psnr to build ssim" 2010-06-12 07:10:39 -07:00
John Koleszar
59c50966ac Enable vp8_sad16x16x4d_sse3 in non-RTCD case
Typo caused C version of 16x16x4 SAD to be called when built with
--disable-runtime-cpu-detect.

Change-Id: I0fe6fa67280b3a5f13acb3c8ed914f039aaaf316
2010-06-11 13:15:30 -04:00
John Koleszar
9099fc0d69 require --enable-psnr to build ssim
ssim.c comiles in a huge (512M) amount of global scratch space. Allocating
this data on the heap would be a better solution, but this file doesn't
need to be built at all in most cases, so as a first pass, disable it
except when doing opsnr.stt output (--enable-psnr).

Change-Id: I320d812f6d652a12516a16b52295ebff20b5bd42
2010-06-11 13:05:08 -04:00
Makoto Kato
63ea8705eb some XMM registers are non-volatile on windows x64 ABI
XMM6 to XMM15 are non-volatile on Windows x64 ABI.  We have to save
these registers.

Change-Id: I4676309f1350af25c8a35f0c81b1f0499ab99076
2010-06-11 12:11:15 -04:00
Paul Wilkins
20f7332b34 Incorrect comment.
(Thanks to Ronald S. Bultje)
2010-06-11 16:12:45 +01:00
Paul Wilkins
7a81b29d38 Use local pointer to pbi->common. 2010-06-11 15:17:57 +01:00
Paul Wilkins
f6a58d620d Tuning of baseline Rd equation to improve behavior at the
low and high Q ends.
2010-06-11 15:10:51 +01:00
Yunqing Wang
8389f1967c Merge "Improve vp8_sixtap_predict functions" 2010-06-11 06:48:52 -07:00
John Koleszar
fb220d257b replace while(0) construct with if/else
No good reason to be tricky here. I don't know why 'break' occurred to me
as the natrual replacement for the 'return', but an if/else block is
definitely clearer.

Change-Id: I08a336307afeb0dc7efa494b37398f239f66c2cf
2010-06-10 20:15:21 -04:00
Timothy B. Terriberry
05c6eca4db Fix new MV clamping scheme for chroma MVs.
The new scheme introduced in I68d35a2f did not clamp chroma MVs in the SPLITMV
 case, and clamped them incorrectly (to the luma plane bounds) in every other
 case.
Because chroma MVs are computed from the luma MVs before clamping occurs, they
 could still point outside of the frame buffer and cause crashes.
This clamping happens outside of the MV prediction loop, and so should not
 affect bitstream decoding.
2010-06-10 18:42:24 -04:00
John Koleszar
317a66693b Remove reference to 'vpx Technologies'
Vestigial.

Change-Id: Iffa9e6d5ba5199b136d7549890101da17c11e3c3
2010-06-10 12:08:01 -04:00
Yunqing Wang
8873a93811 Improve vp8_sixtap_predict functions
Restructure vp8_sixtap_predict functions to eliminate extra 5-line
calculation while doing first-pass only. Also, combline functions
to eliminate usage of intermediate buffer. This gives decoder a 3%
performance gain on my test clips.

Change-Id: I13de49638884d1a57d0855c63aea719316d08c1b
2010-06-10 11:48:48 -04:00
Paul Wilkins
10ae99c67b Merge "Adjust to avoid long line" 2010-06-10 03:24:54 -07:00
Paul Wilkins
a04ed23ff5 Adjust to avoid long line 2010-06-10 11:15:05 +01:00
Paul Wilkins
cd715faa50 Merge "Correct comment" 2010-06-10 03:05:32 -07:00
Paul Wilkins
ae244efb85 Merge "Fix RD bug." 2010-06-10 03:04:45 -07:00
John Koleszar
f6f0ffe96a Merge "Remove secondary mv clamping from decode stage" 2010-06-09 17:55:57 -07:00
John Koleszar
3085025fa1 Remove secondary mv clamping from decode stage
This patch removes the secondary MV clamping from the MV decoder. This
behavior was consistent with limits placed on non-split MVs by the
reference encoder, but was inconsistent with the MVs generated in the
split case.

The purpose of this secondary clamping was only to prevent crashes on
invalid data. It was not intended to be a behaviour an encoder could or
should rely on. Instead of doing additional clamping in a way that
changes the entropy context, the secondary clamp is removed and the
border handling is made implmentation specific. With respect to the
spec, the border is treated as essentially infinite, limited only by
the clamping performed on the near/nearest reference and the maximum
encodable magnitude of the residual MV.

This does not affect any currently produced streams.

Change-Id: I68d35a2fbb51570d6569eab4ad233961405230a3
2010-06-09 11:47:24 -04:00
Yaowu Xu
3225b893e8 minor cleanup of quantizer and fdct code
Change-Id: I7ccc580410bea096a70dce0cc3d455348d4287c5
2010-06-08 15:13:50 -07:00
Yaowu Xu
4bb895e854 fix a typo
Change-Id: I180a05ad57ee6164a6a169ee08e8affd09671eee
2010-06-08 09:37:01 -07:00
Paul Wilkins
6702a4047d Correct comment 2010-06-08 09:59:57 +01:00
Paul Wilkins
28de670cd9 Fix RD bug. 2010-06-07 17:34:46 +01:00
Philip Jägenstedt
0dd78af3e9 remove unreferenced variable i 2010-06-07 11:35:33 -04:00
Yaowu Xu
60254794bc Merge "Remove duplicate and unused functions" 2010-06-07 07:42:22 -07:00
Yaowu Xu
854c007a77 Remove duplicate and unused functions
Change-Id: I944035e720ef834561a9da0d723879a4f787312c
2010-06-07 07:41:07 -07:00
John Koleszar
7aa97a35b5 shared library support (.so)
This patch adds support for building shared libraries when configured
with the --enable-shared switch.

Building DLLs would require more invasive changes to the sample
utilities than I want to make in this patch, since on Windows you can't
use the address of an imported symbol in a static initializer. The best
way to work around this is proably to build the codec interface mapping
table with an init() function, but dll support is of questionable value
anyway, since most windows users will probably use a media framework
lib like webmdshow, which links this library in staticly.

Change-Id: Iafb48900549b0c6b67f4a05d3b790b2643d026f4
2010-06-05 16:47:23 -04:00
John Koleszar
09202d8071 LICENSE: update with latest text
Change-Id: Ieebea089095d9073b3a94932791099f614ce120c
2010-06-04 16:19:40 -04:00
Yaowu Xu
cbf12db901 Merge "Remove un-necessary memory initialization" 2010-06-01 19:20:37 -07:00
John Koleszar
0952acb79a setup experimental infrastructure
This patch creates some basic infrastructure for doing bitstream-
incompatible changes to the VP8 encoder. The key parts are:

 - --enable-experimental configure switch, to enable support for this
   incompatible bitstream. This switch is required to be set to enable
   any "experiments"

 - A list for "experiments" which translate into --enable-<experiment>
   options and CONFIG_<experiment> macros.

 - The high bit of the "Version" field is used to indicate that the
   bitstream was produced by an experimental encoder. The decoder will
   fail to decode an experimental bitstream without
   --enable-experimental.

 - A new "vp8x" encoder interface is created to set the experimental
   bit.

 - The vp8x encoder interface is made the default for ivfenc in
   experimental mode.

Change-Id: Idbdd5eae4cec5becf75bb4770837dcd256b2abef
2010-06-01 11:14:33 -04:00
Yunqing Wang
d33bf3d664 Remove costly memory reads/writes in vp8_reset_mb_tokens_context()
Tests on x86 showed this function costed 2.7% of total decoding time
because of all the memory reads/writes. After modification, it only
costs about 0.7% of decoding time, which gives a 2% gain.

Change-Id: I5003ee30b6dc6dea0bfa42a6ad7e7c22fcc7b215
2010-06-01 07:59:50 -04:00
Yaowu Xu
66f9864a38 Remove un-necessary memory initialization
The intra prediction needs one line above at the top edge.
2010-05-29 22:59:31 -07:00
John Koleszar
1689564bb5 Merge "expose vp8_deblock" 2010-05-28 08:49:04 -07:00
Luca Barbato
e7876abb2c expose vp8_deblock
it is used by vp8/encoder/onyx_if.c

fixes:
vp8/encoder/onyx_if.c:5189: warning: implicit declaration of function
‘vp8_deblock’
2010-05-28 10:37:43 +02:00
Yaowu Xu
a7bb3360bc Fix stats format and correct data size and bit rate output
Change-ID: I093abe6094589a0d73f6ca85b825678a19e68285
2010-05-27 19:56:18 -07:00
Yaowu Xu
8caa5c2d30 Increase the size of output packet list
This is to accommodate output packets for both compressed
data and psnr stats. For each frame, there are at least
one packet for compressed data and one for psnr stats. For
a max lag of 25, 64 is large enough to cover all lagged
frames at the end of encoding.

Change-Id: If20787fbc86f96e1aa16a3ccf2adc93e6c1e3d5f
2010-05-27 19:44:15 -07:00
Paul Wilkins
57d59f6ee7 Merge "Correct bit allocation when the alternative reference frame" 2010-05-26 09:06:49 -07:00
John Koleszar
0270a790d7 Merge "vpx_image: add VPX_ prefix to PLANE_*" 2010-05-25 13:00:51 -07:00
John Koleszar
4c627f5697 remove references to vp8/vp8.h
This file was moved to vpx/, currently this reference breaks the MSVS build.

Change-Id: I2c90a7a1c09cb66055e3daf84facefcaee1085a1
2010-05-25 10:17:35 -04:00
Paul Wilkins
ea4b6f18cb Correct bit allocation when the alternative reference frame
is constructed from multiple source frames

Change-Id: I2e026c10d02b071b401c9fe8ab8dcfc0ac306103
2010-05-25 14:26:26 +01:00
John Koleszar
b6c71918ae vpx_image: add VPX_ prefix to PLANE_*
The PLANE_{PACKED,Y,U,V,ALPHA} macros should be renamed to be within the
VPX_ namespace.

Fixes #27
2010-05-24 21:45:05 -04:00
John Koleszar
b7492341ac install includes in DIST_DIR/include/vpx, move vpx_codec/ to vpx/
This renames the vpx_codec/ directory to vpx/, to allow applications
to more consistently reference these includes with the vpx/ prefix.
This allows the includes to be installed in /usr/local/include/vpx
rather than polluting the system includes directory with an
excessive number of includes.

Change-Id: I7b0652a20543d93f38f421c60b0bbccde4d61b4f
2010-05-24 20:27:42 -04:00
John Koleszar
6be1d9337e Merge "Fixed an encoder debug/relese mismatch in x86_64-win64-vs8" 2010-05-24 11:07:13 -07:00
Yunqing Wang
ad6a9d4e50 Fixed minor bug for realtime-only building 2010-05-24 11:30:04 -04:00
James Zern
6cd4a10e16 Put img_fmt in the vpx namespace
Avoid an potential name clashes and match other external types.
s/IMG_FMT/VPX_$&/g
s/img_fmt/vpx_$&/g

Change-Id: Ia7ad5bbb6424416b37e71e5f5eb1eca31c3c707f
2010-05-21 09:19:13 -04:00
John Koleszar
1df0314e7b configure: remove HAVE_CONFIG_H
This doesn't play well with autotools, and the preprocessor magic is
confusing and unhelpful in the vp8-only context.

Change-Id: I2fcb57e6eb7876ecb58509da608dc21f26077ff1
2010-05-21 05:53:48 -04:00
Paul Wilkins
c012d63ec9 Fixed incorrect casts that broke rate control in some situations. 2010-05-20 16:49:39 +01:00
Yaowu Xu
c15652bce1 Fixed an encoder debug/relese mismatch in x86_64-win64-vs8
Visual c++ compiler uses xmm registers for floating point
operations for 64 bit architecture, therefore its calling
convention requires the preservation of xmm6-xmm15 in any
function that have used these registers. However, the sse2
functions, that were originally written for 32 bit windows,
may have used xmm6 and xmm7 without preserving the content.
In this particular case, the compiler used xmm6 to save
the variable "two_pass_min_rate", the value of the variable
is mucked up by our sse2 optimized loop filter functions,
hence the results of release/debug mismatching.
2010-05-19 15:48:00 -07:00
Pavol Rusnak
0fc9abfbfd remove unneeded variables 2010-05-19 21:15:32 +02:00
John Koleszar
0ea50ce9cb Initial WebM release 2010-05-18 11:58:33 -04:00