Commit Graph

194 Commits

Author SHA1 Message Date
John Koleszar
cdd2066687 unset execute bit on c source
Change-Id: I6625ee41f8872908cb015ce0729e1c7a105b5217
2010-09-21 19:48:06 -04:00
John Koleszar
6f4c0435d1 Merge "Don't reset mb clamping state during splitmv decoding" 2010-09-21 09:06:59 -07:00
John Koleszar
4d391e8ed2 Don't reset mb clamping state during splitmv decoding
The MV decoding changes in c5fb0eb introduced a bug where the
macroblock clamping state was reset for each partition, so if an
earlier partition needed clamping but a subsequent one didn't,
the MB wouldn't receive clamping. Instead, the state is only
set during splitmv decoding, never cleared.

Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
2010-09-21 11:58:48 -04:00
John Koleszar
015cfcafbd Merge "Add high limit check for unsigned parameters" 2010-09-21 05:36:46 -07:00
Yunqing Wang
a23ccf8f8c Merge "Restructure multi-threaded decoder" 2010-09-21 05:00:30 -07:00
Fritz Koenig
b7dc9398f2 Use movq instead of movdqu.
Movdqu is more expensive (throughput, uops) than movq.  Minimal
impact for newer big cores, but ~2.25% gain on Atom.

Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
2010-09-20 11:34:26 -07:00
Fritz Koenig
1c906448cc Merge "Better choice of instruction filter mask comparision." 2010-09-20 11:01:51 -07:00
Johann
6cf2b4aa0e Merge "reorder data to use wider instructions" 2010-09-20 10:47:33 -07:00
Johann
9c9afbab85 Merge "Update NEON wide idcts" 2010-09-20 10:47:22 -07:00
Fritz Koenig
8eae7fe7e8 Better choice of instruction filter mask comparision.
Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.

Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
2010-09-20 10:20:38 -07:00
Guillermo Ballester Valor
236906863a Add high limit check for unsigned parameters
The patch related with issue #55 (5a72620) fixed some warnings, but the
fix was not optimal. It actually was a trick to confuse compiler rather
than a fix.

This patch fixes it by creating a new macro used when needed just a high
limit check for an unsigned.

Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5
2010-09-20 10:03:05 -04:00
Johann
022323bf85 reorder data to use wider instructions
the previous commit laid the groundwork by doing two sets of idcts
together. this moved that further by grouping the interesting data
(q[0], q+16[0]) together to allow using wider instructions. also
managed to drop a few instructions by recognizing that the constant
for sinpi8sqrt2 could be downshifted all the time which avoided a
dowshift as well as workarounds for a function which only accepted
signed data

looks like a modest gain for performance: at qcif, went from ~180
fps to ~183
Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
2010-09-17 16:47:39 -04:00
Yunqing Wang
f857a85088 Restructure multi-threaded decoder
On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.

The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.

Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.

Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
2010-09-17 09:56:05 -04:00
John Koleszar
9100073e8d cleanup: remove unused xprintf
These files aren't currently used, and we can get them back if we
need them.

Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5
2010-09-16 13:14:12 -04:00
John Koleszar
147b125b15 Reduce size of tokenizer tables
This patch reduces the size of the global tables maintained by the
tokenizer to 16k from 80k-96k. See issue #177.

Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe
2010-09-16 10:00:04 -04:00
Fritz Koenig
769f2424cc Removed unnecessary pxor.
There is no need to make sure that the lower byte of the
register is 0 because the downshift by 11 overwrites that byte.

Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
2010-09-13 18:34:34 -07:00
Fritz Koenig
71a1c19754 Merge "Make block access to frame buffer sequential" 2010-09-13 11:04:22 -07:00
Fritz Koenig
a65cd3def0 Make block access to frame buffer sequential
Sequentially accessing memory from a low address to a high
address should make it easier for the processor to predict
the cache.

Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
2010-09-10 16:27:28 -07:00
Scott LaVarnway
a32ded1d5f Merge "Improved subset block search" 2010-09-09 11:51:29 -07:00
Scott LaVarnway
c5fb0eb8d9 Improved subset block search
Improved the subset block search and fill.  (about 3% improvement for
32 bit)  Modified/merged the code in order to create
vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock
level. This will allow the decode loop (in the future) to decode
modes/mvs on a frame, row, or mb level.

Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3
2010-09-09 14:42:48 -04:00
Johann
14ba764219 Update NEON wide idcts
Expand 93c32a55 which used SSE2 instructions to do two
idct/dequant/recons at a time to NEON. Initial working
commit. More work needs to be put into rearranging and
interlacing the data to take advantage of quadword
operations, which is when we'll hopefully see a much
better boost

Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1
2010-09-09 14:08:12 -04:00
John Koleszar
edcbb1c199 Fix GF interval for non-lagged ARFs
When ARFs are enabled in non-lagged compress modes, the GF interval
was being reset to zero. Non-lagged ARF updates were enabled in commit
63ccfbd, but this incorrect GF interval caused a quality regression.

Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3
2010-09-09 13:18:54 -04:00
Fritz Koenig
6d90f867e4 Merge branch 'master' of git://review.webmproject.org/libvpx 2010-09-09 08:54:21 -07:00
John Koleszar
c2140b8af1 Use WebM in copyright notice for consistency
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.

Fixes issue #97.

Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-09-09 10:01:21 -04:00
Jim Bankoski
69ae8f475d Skip unnecessary search of identical frames
vp8_get_compressed_data() was defeating logic in
encode_frame_to_datarate() that determined the reference buffers to
search and forcing all frames to be eligible to search. In cases
where buffers have identical contents, this is unnecessary extra
work.

Change-Id: I9e667ac39128ae32dc455a3db4c62e3efce6f114
2010-09-08 11:31:34 -04:00
Jim Bankoski
63ccfbd545 Enable ARFs for non-lagged compress
ARFs were explicitly disabled except in lagged compress mode. New
ARF logic allows for the ARF buffer to hold an older golden frame,
which does not require lagged compress.

Change-Id: I1dff82b6f53e8311f1e0514b1794ae05919d5f79
2010-09-08 11:26:13 -04:00
Fritz Koenig
3fb37162a8 Bilinear subpixel optimizations for ssse3.
Used pmaddubsw for multiply and add of two filter taps
at once for 16x16 and 8x8 blocks.

Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea
2010-09-07 17:19:40 -07:00
Scott LaVarnway
0de458f6b9 Reduced the size of MB_MODE_INFO
Moved partition_bmi and partition_count out of MB_MODE_INFO and
placed into MACROBLOCK.  Also reduced the size of other members
of the MB_MODE_INFO struct.  For 1080p, the memory was reduced
by 1,209,516 bytes.  The decoder performance appeared to improve
by 3% for the clip used.
Note:  The main goal for this change is to improve the decoder
performance.  The encoder will be revisited at a later date for
further structure cleanup.

Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
2010-09-03 16:43:23 -04:00
John Koleszar
4496db45e3 Whitespace: nuke CRLFs
Change-Id: I8b9fdf9875a8fcff4cb49a3357ce44f18108c2e7
2010-09-02 13:33:01 -04:00
James Zern
76640f85da encoder: remove postproc dependency
Remove the dependency on postproc.c for the encoder in general, the only
unchecked need for it is when CONFIG_PSNR is enabled. All other cases
are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file
will still be included.

Additionally, when VP8_SET_POSTPROC is used with the encoder when post
processing has been disabled an error will be returned.

This addresses issue #153.

Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089
2010-09-02 11:52:37 -04:00
John Koleszar
7a3e0a1d93 Merge "added separate rounding/zbin constants for 2nd order" 2010-09-02 08:42:29 -07:00
John Koleszar
9398be0f46 Merge "Disable frame dropping by default" 2010-09-02 08:41:46 -07:00
Yaowu Xu
fca129203a added separate rounding/zbin constants for 2nd order
This allows experiments of using different rounding and
zerobin constants for 2nd order blocks.

Change-Id: Idd829adba3edd1f713c66151a8d29bb245e33a71
2010-09-02 10:27:03 -04:00
John Koleszar
23216211bc Disable frame dropping by default
This is not the behavior that most users expect.

Change-Id: I226126ea400c22cf1f7918e80ea7fe0771c569cb
2010-09-02 09:32:03 -04:00
Frank Galligan
d45e55015e Fix rare deadlock before loop filter
There was an extremely rare deadlock that happened when one thread
was waiting to start the loop filter on frame n while the other
threads were starting to work on frame n+1.

Change-Id: Icc94f728b3b6663405435640d9a2996735ba19ef
2010-09-01 22:01:21 -04:00
Paul Wilkins
18c902f8a4 Merge "Improved Force Key Frame Behaviour" 2010-09-01 02:45:12 -07:00
Yunqing Wang
0e78efad0b Replace sleep(0) calls in multi-threaded decoder
This is a workaround for gLucid problem.

Change-Id: I188a016a07e4c2ea212444c5a6284ff3c48a5caa
2010-08-31 20:37:11 -04:00
Paul Wilkins
c239a1b67c Improved Force Key Frame Behaviour
These changes improve the behaviour of the code with
forced key frames sent in by a calling application.

The sizing of the frames is still suboptimal for two pass in
particular but the behaviour is much better than it was.

Change-Id: I35fae610c67688ccc69d11f385e87dfc884e65a1
2010-08-31 14:32:40 -04:00
Johann
0b94f5d6e8 followup arm patch
make the arm asm detokenizer work with the new structures

Change-Id: I7cd92c2a018ec24032bb1cfd1bb9739bc84b444a
2010-08-31 11:41:10 -04:00
Scott LaVarnway
e85e631504 Changed above and left context data layout
The main reason for the change was to reduce cycles in the token
decoder. (~1.5% gain for 32 bit)  This layout should be more
cache friendly.

As a result of this change, the encoder had to be updated.

Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837
Note: dixie uses a similar layout
2010-08-31 11:24:30 -04:00
John Koleszar
aaad6d1b54 Merge "Fix harmless off-by-1 error." 2010-08-30 12:40:42 -07:00
John Koleszar
674e477b81 Merge "increase rate control buffer level precision" 2010-08-30 07:49:35 -07:00
Timothy B. Terriberry
7a8e0a2935 Fix harmless off-by-1 error.
The memory being zeroed in vp8_update_mode_info_border() was just
 allocated with calloc, and so the entire function is actually
 redundant, but it should be made correct in case someone expects
 it to actually work in the future.

Change-Id: If7a84e489157ab34ab77ec6e2fe034fb71cf8c79
2010-08-27 16:07:54 -07:00
Johann
5c244398e1 clean up compiler warnings
did a test compile with clang and got rid of some warnings that have
been annoying me for a while:
vp8/decoder/detokenize.c: In function 'vp8_init_detokenizer':
vp8/decoder/detokenize.c:121: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:122: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:123: warning: assignment from incompatible pointer type
vp8/decoder/detokenize.c:124: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:125: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:128: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:129: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:130: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:131: warning: assignment discards qualifiers from pointer target type

Change-Id: I78ddab176fe47cbeed30379709dc7bab01c0c2e4
2010-08-24 18:23:16 -04:00
Johann
d73217ab17 update structures
mbmi and eob moved in previous commits

Change-Id: I30a2eba36addf89ee50b406ad4afdd059a832711
2010-08-23 13:44:56 -04:00
Fritz Koenig
93c32a55c2 Rework idct calling structure.
Moving the eob structure allows for a non-struct based
function to handle decoding an entire mb of
idct/dequant/recon data.  This allows for SIMD functions
to idct/dequant/recon multiple blocks at once.

SSE2 implementation gives 3% gain on Atom.

Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2
2010-08-23 08:58:54 -07:00
John Koleszar
8e7ebacb19 increase rate control buffer level precision
The external API exposes the RC initial/optimal/full buffer level in
milliseconds, but this value was truncated internally to seconds. This
patch allows the use of the full precision during the conversion from
time to bits.

Change-Id: If8dd2a87614c05747f81432cbe75dd9e6ed2f04e
2010-08-20 11:04:48 -04:00
Jim Bankoski
b0660457fe Revert "Removed ssse3 sixtap code"
This reverts commit 6ea5bb85cd.
2010-08-19 15:58:27 -04:00
Johann
52852da7c9 cleanup simple loop filter
move some things around, reorder some instructions

constant 0 is used several times. load it once per call in horiz,
once per loop in vert.

separate saturating instructions to avoid stalls.

just use one usub8 call to set GE flags, rather than uqsub8 followed by
usub8 w/ 0

document some stalls for further consideration

Change-Id: Ic3877e0ddbe314bb8a17fd5db73501a7d64570ec
2010-08-19 13:37:40 -04:00
Johann
a522be2941 Merge "fix armv6 simpleloop filter" 2010-08-19 08:31:57 -07:00