Change bitreading functions to use a larger window which is refilled less
often.
This makes it cheap enough to do bounds checking each time the window is
refilled, which avoids the need to copy the input into a large circular
buffer.
This uses less memory and speeds up the total decode time by 1.6% on an ARM11,
2.8% on a Cortex A8, and 2.2% on x86-32, but less than 1% on x86-64.
Inlining vp8dx_bool_decoder_fill() has a big penalty on x86-32, as does moving
the refill loop to the front of vp8dx_decode_bool().
However, having the refill loop between computation of the split values and
the branch in vp8_decode_mb_tokens() is a big win on ARM (presumably due to
memory latency and code size: refilling after normalization duplicates the
code in the DECODE_AND_BRANCH_IF_ZERO and DECODE_AND_LOOP_IF_ZERO cases.
Unfortunately, refilling at the end of vp8dx_bool_decoder_fill() and at the
beginning of each decode step in vp8_decode_mb_tokens() means the latter
requires an extra refill at the end.
Platform-specific versions could avoid the problem, but would require most of
detokenize.c to be duplicated.
Change-Id: I16c782a63376f2a15b78f8086d899b987204c1c7
vs8
- pull yasm.rules [1] into the source tree to avoid need to install
file into VC/VCProjectDefaults
- reference same w/ToolFile & RelativePath
- update arm branch to match
vs7:
- quote source file paths passed to yasm
[1]:
http://www.tortall.net/svn/yasm/trunk/yasm/Mkfiles/vc9/yasm.rules@2271
Change-Id: I52b801496340cd7b1d0023d12afbc04624ecefc3
Added sse2 version of vp8_regular_quantize_b which improved encode
performance(for the clip used) by ~10% for 32 bit builds and ~3% for
64 bit builds.
Also updated SHADOW_ARGS_TO_STACK to allow for more than 9 arguments.
Change-Id: I62f78eabc8040b39f3ffdf21be175811e96b39af
The RSA Data Security, Inc. implementation license bears a requirement
similar to the old problematic BSD license with advertising clause.
Change-Id: I877b71ff0548934b1c4fd87245696f53dedbdf26
Commit 3245d46 "ivfenc: support reading/writing from a pipe" broke
support for two pass encodes of raw files when done in two
invocations of ivfenc. The raw image was only set up on pass 0,
which was never hit when running with --pass=2 --passes=2.
Change-Id: I6a9858be1a8998d5bd45331123b46b1baa05b379
This patch addresses issue #79, which is a regression since commit
28de670 "Fix RD bug." If the coded error value is zero, the iiratio
calculation effectively multiplies by 1000000 by the
DOUBLE_DIVIDE_CHECK macro. This can result in a value larger than
INT_MAX, giving a negative ratio. Since the error values are
conceptually unsigned (though they're stored in a double) this patch
makes the iiratio values unsigned, which allows the clamping to work
as expected.
ssim.c comiles in a huge (512M) amount of global scratch space. Allocating
this data on the heap would be a better solution, but this file doesn't
need to be built at all in most cases, so as a first pass, disable it
except when doing opsnr.stt output (--enable-psnr).
Change-Id: I320d812f6d652a12516a16b52295ebff20b5bd42
No good reason to be tricky here. I don't know why 'break' occurred to me
as the natrual replacement for the 'return', but an if/else block is
definitely clearer.
Change-Id: I08a336307afeb0dc7efa494b37398f239f66c2cf
The new scheme introduced in I68d35a2f did not clamp chroma MVs in the SPLITMV
case, and clamped them incorrectly (to the luma plane bounds) in every other
case.
Because chroma MVs are computed from the luma MVs before clamping occurs, they
could still point outside of the frame buffer and cause crashes.
This clamping happens outside of the MV prediction loop, and so should not
affect bitstream decoding.
Restructure vp8_sixtap_predict functions to eliminate extra 5-line
calculation while doing first-pass only. Also, combline functions
to eliminate usage of intermediate buffer. This gives decoder a 3%
performance gain on my test clips.
Change-Id: I13de49638884d1a57d0855c63aea719316d08c1b
Using uname fails e.g. on a 64-bit machine with a 32-bit toolchain.
The following gcc -dumpmachine strings have been verified:
* 32-bit Linux gives i486-linux-gnu
* 64-bit Linux gives x86_64-linux-gnu
* Mac OS X 10.5 gives i686-apple-darwin9
* MinGW gives mingw32
*darwin8* and *bsd* can safely be assumed to be correct, but *cygwin*
is a guess.
Change-Id: I6bef2ab5e97cbd3410aa66b0c4f84d2231884b05
This patch removes the secondary MV clamping from the MV decoder. This
behavior was consistent with limits placed on non-split MVs by the
reference encoder, but was inconsistent with the MVs generated in the
split case.
The purpose of this secondary clamping was only to prevent crashes on
invalid data. It was not intended to be a behaviour an encoder could or
should rely on. Instead of doing additional clamping in a way that
changes the entropy context, the secondary clamp is removed and the
border handling is made implmentation specific. With respect to the
spec, the border is treated as essentially infinite, limited only by
the clamping performed on the near/nearest reference and the maximum
encodable magnitude of the residual MV.
This does not affect any currently produced streams.
Change-Id: I68d35a2fbb51570d6569eab4ad233961405230a3