reconintra_mt.c is only required for building the decoder right now.
It could definitely be used for the encoder in the future, but it
currently depends on decoder only data structures. (onyxd_int.h,
VP8D_COMP, etc). Move it from common/ to decoder/ until the
necessary changes to the common multithread code are complete.
This patch is needed to build with --disable-vp8-decoder.
Change-Id: I568c52221a2b309234d269675cba97131ce35c86
Having these symbols be available as functions rather than data is
occasionally more convenient. Implemented this way rather than a
get-codec-by-id style to avoid creating a link-time dependency
between the encoder and the decoder.
Fixes issue #169
Change-Id: I319f281277033a5e7e3ee3b092b9a87cce2f463d
The MV decoding changes in c5fb0eb introduced a bug where the
macroblock clamping state was reset for each partition, so if an
earlier partition needed clamping but a subsequent one didn't,
the MB wouldn't receive clamping. Instead, the state is only
set during splitmv decoding, never cleared.
Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
Movdqu is more expensive (throughput, uops) than movq. Minimal
impact for newer big cores, but ~2.25% gain on Atom.
Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.
Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
The patch related with issue #55 (5a72620) fixed some warnings, but the
fix was not optimal. It actually was a trick to confuse compiler rather
than a fix.
This patch fixes it by creating a new macro used when needed just a high
limit check for an unsigned.
Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5
the previous commit laid the groundwork by doing two sets of idcts
together. this moved that further by grouping the interesting data
(q[0], q+16[0]) together to allow using wider instructions. also
managed to drop a few instructions by recognizing that the constant
for sinpi8sqrt2 could be downshifted all the time which avoided a
dowshift as well as workarounds for a function which only accepted
signed data
looks like a modest gain for performance: at qcif, went from ~180
fps to ~183
Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.
The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.
Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.
Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
This patch reduces the size of the global tables maintained by the
tokenizer to 16k from 80k-96k. See issue #177.
Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe
GET_GOT was producing a zero length call. This resulted in
pipeline flushes occuring when returing from the assembly
functions. Masked on out of order cores, but evident on
Atom cores.
Change-Id: I8c375af313e8a169c77adbaf956693c0cfeb5ccd
Modified code so that:
-When above and left contexts are same and not equal to current segment id, it needs to read a maximum of 2 segment_tree_probabilities.
- When above and left contexts are different and not equal to current segment id, it needs to read only a single segment_tree_probability.
Change-Id: Idc2cf2c4afcc6179b8162ac5a32c948ff5a9a2ba
There is no need to make sure that the lower byte of the
register is 0 because the downshift by 11 overwrites that byte.
Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
-Updates by making use of spatial correlation.
-Checks if the segment_id is same as above or left context and encodes only the update to the map instead of updating individual segment_ids.
Change-Id: Ib861df97e8aa2b37516219eeddcdbaf552b6a249
This script is part of a legacy release process and is unsupported. Most
of this functionality has been moved into 'make dist.'
Change-Id: Id67936302083352b628869e2988876cf56558ca5
Sequentially accessing memory from a low address to a high
address should make it easier for the processor to predict
the cache.
Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
Improved the subset block search and fill. (about 3% improvement for
32 bit) Modified/merged the code in order to create
vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock
level. This will allow the decode loop (in the future) to decode
modes/mvs on a frame, row, or mb level.
Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3
Expand 93c32a55 which used SSE2 instructions to do two
idct/dequant/recons at a time to NEON. Initial working
commit. More work needs to be put into rearranging and
interlacing the data to take advantage of quadword
operations, which is when we'll hopefully see a much
better boost
Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1
When ARFs are enabled in non-lagged compress modes, the GF interval
was being reset to zero. Non-lagged ARF updates were enabled in commit
63ccfbd, but this incorrect GF interval caused a quality regression.
Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3