This function was accessing values below the stack pointer, which
can be corrupted by signal delivery at any time.
Change-Id: I92945b30817562eb0340f289e74c108da72aeaca
previous implementation compared each set of values to limit and then
&'d them together, requiring a compare and & for each value.
this does the accumulation first, requiring only one compare
Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323
reconintra_mt.c is only required for building the decoder right now.
It could definitely be used for the encoder in the future, but it
currently depends on decoder only data structures. (onyxd_int.h,
VP8D_COMP, etc). Move it from common/ to decoder/ until the
necessary changes to the common multithread code are complete.
This patch is needed to build with --disable-vp8-decoder.
Change-Id: I568c52221a2b309234d269675cba97131ce35c86
Movdqu is more expensive (throughput, uops) than movq. Minimal
impact for newer big cores, but ~2.25% gain on Atom.
Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.
Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.
The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.
Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.
Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
There is no need to make sure that the lower byte of the
register is 0 because the downshift by 11 overwrites that byte.
Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
Sequentially accessing memory from a low address to a high
address should make it easier for the processor to predict
the cache.
Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
Moved partition_bmi and partition_count out of MB_MODE_INFO and
placed into MACROBLOCK. Also reduced the size of other members
of the MB_MODE_INFO struct. For 1080p, the memory was reduced
by 1,209,516 bytes. The decoder performance appeared to improve
by 3% for the clip used.
Note: The main goal for this change is to improve the decoder
performance. The encoder will be revisited at a later date for
further structure cleanup.
Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
Remove the dependency on postproc.c for the encoder in general, the only
unchecked need for it is when CONFIG_PSNR is enabled. All other cases
are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file
will still be included.
Additionally, when VP8_SET_POSTPROC is used with the encoder when post
processing has been disabled an error will be returned.
This addresses issue #153.
Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089
The main reason for the change was to reduce cycles in the token
decoder. (~1.5% gain for 32 bit) This layout should be more
cache friendly.
As a result of this change, the encoder had to be updated.
Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837
Note: dixie uses a similar layout
The memory being zeroed in vp8_update_mode_info_border() was just
allocated with calloc, and so the entire function is actually
redundant, but it should be made correct in case someone expects
it to actually work in the future.
Change-Id: If7a84e489157ab34ab77ec6e2fe034fb71cf8c79
Moving the eob structure allows for a non-struct based
function to handle decoding an entire mb of
idct/dequant/recon data. This allows for SIMD functions
to idct/dequant/recon multiple blocks at once.
SSE2 implementation gives 3% gain on Atom.
Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2
move some things around, reorder some instructions
constant 0 is used several times. load it once per call in horiz,
once per loop in vert.
separate saturating instructions to avoid stalls.
just use one usub8 call to set GE flags, rather than uqsub8 followed by
usub8 w/ 0
document some stalls for further consideration
Change-Id: Ic3877e0ddbe314bb8a17fd5db73501a7d64570ec
test cases were causing a crash because the count was being read
incorrectly. after fixing that, noticed that the output was not
matching. fixed that.
Change-Id: Idb0edb887736bd566a3cf6d4aa1a03ea8d20eb27
vp8_update_gf_useage_maps() is only used by the encoder. This patch
fixes the ability to build in decode-only or encode-only
configurations.
Change-Id: I3a5211428e539886ba998e09e8abd747ac55c9aa
asm_offsets contains some definitions which are no longer used. this
was one of them. v6 build works now
Change-Id: If370cfa8acd145de4fead2d9a11b048fccc090df
These copies occurred for each macroblock in the encoder and decoder.
Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD. As a result,
a large number compile errors had to be fixed.
Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3
The mv_ref and sub_mv_ref token encodings are indexed from NEARESTMV
and LEFT4X4, respectively, rather than being zero-based like the
other token encodings.
Change-Id: I3699c3f84111209ecfb91097c4b900773e9a3ad5