generic-library/vpx

Author	SHA1	Message	Date
Johann	022323bf85	reorder data to use wider instructions the previous commit laid the groundwork by doing two sets of idcts together. this moved that further by grouping the interesting data (q[0], q+16[0]) together to allow using wider instructions. also managed to drop a few instructions by recognizing that the constant for sinpi8sqrt2 could be downshifted all the time which avoided a dowshift as well as workarounds for a function which only accepted signed data looks like a modest gain for performance: at qcif, went from ~180 fps to ~183 Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf	2010-09-17 16:47:39 -04:00
Yunqing Wang	f857a85088	Restructure multi-threaded decoder On each MB, loopfiltering is done right after MB decoding. This combines two loops in multi-threaded code into one, which reduces number of synchronizations to half. The above-row/left-col data are saved in temp buffers for next-row/next MB decoding. Tests on 4-core gLucid machine showed 10% decoder performance gain with threads=4 (tulip clip). Testing on other platforms isn't done yet. Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9	2010-09-17 09:56:05 -04:00
John Koleszar	9100073e8d	cleanup: remove unused xprintf These files aren't currently used, and we can get them back if we need them. Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5	2010-09-16 13:14:12 -04:00
Scott LaVarnway	c5fb0eb8d9	Improved subset block search Improved the subset block search and fill. (about 3% improvement for 32 bit) Modified/merged the code in order to create vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock level. This will allow the decode loop (in the future) to decode modes/mvs on a frame, row, or mb level. Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3	2010-09-09 14:42:48 -04:00
Johann	14ba764219	Update NEON wide idcts Expand `93c32a55` which used SSE2 instructions to do two idct/dequant/recons at a time to NEON. Initial working commit. More work needs to be put into rearranging and interlacing the data to take advantage of quadword operations, which is when we'll hopefully see a much better boost Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1	2010-09-09 14:08:12 -04:00
John Koleszar	c2140b8af1	Use WebM in copyright notice for consistency Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba	2010-09-09 10:01:21 -04:00
Scott LaVarnway	0de458f6b9	Reduced the size of MB_MODE_INFO Moved partition_bmi and partition_count out of MB_MODE_INFO and placed into MACROBLOCK. Also reduced the size of other members of the MB_MODE_INFO struct. For 1080p, the memory was reduced by 1,209,516 bytes. The decoder performance appeared to improve by 3% for the clip used. Note: The main goal for this change is to improve the decoder performance. The encoder will be revisited at a later date for further structure cleanup. Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613	2010-09-03 16:43:23 -04:00
Frank Galligan	d45e55015e	Fix rare deadlock before loop filter There was an extremely rare deadlock that happened when one thread was waiting to start the loop filter on frame n while the other threads were starting to work on frame n+1. Change-Id: Icc94f728b3b6663405435640d9a2996735ba19ef	2010-09-01 22:01:21 -04:00
Yunqing Wang	0e78efad0b	Replace sleep(0) calls in multi-threaded decoder This is a workaround for gLucid problem. Change-Id: I188a016a07e4c2ea212444c5a6284ff3c48a5caa	2010-08-31 20:37:11 -04:00
Johann	0b94f5d6e8	followup arm patch make the arm asm detokenizer work with the new structures Change-Id: I7cd92c2a018ec24032bb1cfd1bb9739bc84b444a	2010-08-31 11:41:10 -04:00
Scott LaVarnway	e85e631504	Changed above and left context data layout The main reason for the change was to reduce cycles in the token decoder. (~1.5% gain for 32 bit) This layout should be more cache friendly. As a result of this change, the encoder had to be updated. Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837 Note: dixie uses a similar layout	2010-08-31 11:24:30 -04:00
Johann	5c244398e1	clean up compiler warnings did a test compile with clang and got rid of some warnings that have been annoying me for a while: vp8/decoder/detokenize.c: In function 'vp8_init_detokenizer': vp8/decoder/detokenize.c:121: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:122: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:123: warning: assignment from incompatible pointer type vp8/decoder/detokenize.c:124: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:125: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:128: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:129: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:130: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:131: warning: assignment discards qualifiers from pointer target type Change-Id: I78ddab176fe47cbeed30379709dc7bab01c0c2e4	2010-08-24 18:23:16 -04:00
Johann	d73217ab17	update structures mbmi and eob moved in previous commits Change-Id: I30a2eba36addf89ee50b406ad4afdd059a832711	2010-08-23 13:44:56 -04:00
Fritz Koenig	93c32a55c2	Rework idct calling structure. Moving the eob structure allows for a non-struct based function to handle decoding an entire mb of idct/dequant/recon data. This allows for SIMD functions to idct/dequant/recon multiple blocks at once. SSE2 implementation gives 3% gain on Atom. Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2	2010-08-23 08:58:54 -07:00
Johann	9602799cd9	framework for assembly version of the detokenizer adds a compile time option: --enable-arm-asm-detok which pulls in vp8/decoder/arm/detokenize.asm currently about break even speed wise, but changes are pending to the fill code (branch and load 3 bytes versus conditionally always load one) and the error handling. Currently it doesn't handle zero runs or overrunning the buffer. this is really just so i don't have to rebase my changes all the time to run benchmarks - now just need to replace one file! Change-Id: I56d0e2354dc0ca3811bffd0e88fe1f952fa6c797	2010-08-12 16:39:56 -04:00
Scott LaVarnway	9c7a0090e0	Removed unnecessary MB_MODE_INFO copies These copies occurred for each macroblock in the encoder and decoder. Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD. As a result, a large number compile errors had to be fixed. Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3	2010-08-12 16:25:43 -04:00
Scott LaVarnway	99f46d62d9	Moved gf_active code to encoder only The gf_active code is only used by the encoder, so it was moved from common and decoder. Change-Id: Iada15acd5b2b33ff70c34668ca87d4cfd0d05025	2010-08-11 11:54:25 -04:00
Yunqing Wang	ba2e107d28	First modification of multi-thread decoder This is the first modification of VP8 multi-thread decoder, which uses same threads to decode macroblocks and then do loopfiltering for each frame. Inspired by Rob Clark, synchronization was done on every 8 macroblocks instead of every macroblock to reduce lock contention. Comparing with the original code, this implementation gave about 15%- 20% performance gain while decoding my test clips on a Core2 Quad platform (Linux). The work is not done yet. Test on other platforms are needed. Change-Id: Ice9ddb0b511af1359b9f71e65066143c04fef3b5	2010-08-10 14:09:57 -04:00
John Koleszar	675298216d	Merge "Replace pinsrw (SSE) with MMX instructions"	2010-08-02 06:16:26 -07:00
Philip Jägenstedt	7d243701d9	Replace pinsrw (SSE) with MMX instructions Fixes http://code.google.com/p/webm/issues/detail?id=136 Change-Id: I5a3e294061644a1a9718e8ba4a39548ede25cc42	2010-08-02 09:15:45 -04:00
John Koleszar	38a20e030f	apple: include proper mach primatives Fixes implicit declaration warning for 'mach_task_self'. Patch courtesy of timeless at gmail.com Change-Id: I9991dedd1ccfddc092eca86705ecbc3b764b799d	2010-07-29 17:04:44 -04:00
Johann	b9a038a5ed	Fix build w/o RTCD So many places to update ... Change-Id: Ide957b40cc833f99c2d1849acade6850fbf7585d	2010-07-27 11:56:19 -04:00
Johann	56f5a9a060	update arm idct functions Jeff Muizelaar posted some changes to the idct/reconstruction c code. This is the equivalent update for the arm assembly. This shows a good boost on v6, and a minor boost on neon. Here are some numbers for highway in qcif, 2641 frames: HEAD neon: ~161 fps new neon: ~162 fps HEAD v6: ~102 fps new v6: ~106 fps The following functions have been updated for armv6 and neon: vp8_dc_only_idct_add vp8_dequant_idct_add vp8_dequant_dc_idct_add Conflicts: vp8/decoder/arm/armv6/dequantdcidct_v6.asm vp8/decoder/arm/armv6/dequantidct_v6.asm Resolved by removing these files. When I rewrote the functions, I also moved the files to dequant_dc_idct_v6.asm/dequant_idct_v6.asm Change-Id: Ie3300df824d52474eca1a5134cf22d8b7809a5d4	2010-07-26 08:55:19 -04:00
Jeff Muizelaar	98fcccfe97	Change the x86 idct functions to do reconstruction at the same time Change-Id: I896fe6f9664e6849c7cee2cc6bb4e045eb42540f	2010-07-23 15:21:36 -04:00
Jeff Muizelaar	b2fa74ac18	Combine idct and reconstruction steps This moves the prediction step before the idct and combines the idct and reconstruction steps into a single step. Combining them seems to give an overall decoder performance improvement of about 1%. Change-Id: I90d8b167ec70d79c7ba2ee484106a78b3d16e318	2010-07-23 15:21:36 -04:00
Fritz Koenig	0ce3901282	Swap alt/gold/new/last frame buffer ptrs instead of copying. At the end of the decode, frame buffers were being copied. The frames are not updated after the copy, they are just for reference on later frames. This change allows multiple references to the same frame buffer instead of copying it. Changes needed to be made to the encoder to handle this. The encoder is still doing frame buffer copies in similar places where pointer reference could be done. Change-Id: I7c38be4d23979cc49b5f17241ca3a78703803e66	2010-07-23 14:53:59 -04:00
Fritz Koenig	08eed049d4	Remove CONFIG_NEW_TOKENS files. These files were out of date and no longer maintained. Token decoding has implemented the no-crash code which is incompatible with this arm assembly code. Change-Id: Ibf729886c56fca48181af60b44bda896c30023fc	2010-07-22 19:00:21 -04:00
Michael Kohler	80f0e7a7d0	limit range checking code for L[k] to CONFIG_DEBUG. patch by timeless@gmail.com	2010-07-12 18:41:45 +02:00
John Koleszar	308e867f91	Update loopfilter frame/filter/sharp info for multithread Change I9fd1a5a4 updated the multithreaded loopfilter to avoid reinitializing several parameteres if they haven't changed from the last frame, but the code to update the last frame's parameters wasn't invoked in the multithreaded case. Change-Id: Ia23d937af625c01dd739608e02d110f742b7e1f2	2010-06-30 10:23:53 -04:00
Yunqing Wang	29d586b462	Add loopfilter initialization fix in multithreading code Modified loopfilter initialization to avoid unnecessary operations. Change-Id: I9fd1a5a49edc1cb8116c2a72a6908b1e437459ec	2010-06-30 09:42:39 -04:00
John Koleszar	94c52e4da8	cosmetics: trim trailing whitespace When the license headers were updated, they accidentally contained trailing whitespace, so unfortunately we have to touch all the files again. Change-Id: I236c05fade06589e417179c0444cb39b09e4200d	2010-06-18 13:06:11 -04:00
Timothy B. Terriberry	c17b62e1bd	Change bitreader to use a larger window. Change bitreading functions to use a larger window which is refilled less often. This makes it cheap enough to do bounds checking each time the window is refilled, which avoids the need to copy the input into a large circular buffer. This uses less memory and speeds up the total decode time by 1.6% on an ARM11, 2.8% on a Cortex A8, and 2.2% on x86-32, but less than 1% on x86-64. Inlining vp8dx_bool_decoder_fill() has a big penalty on x86-32, as does moving the refill loop to the front of vp8dx_decode_bool(). However, having the refill loop between computation of the split values and the branch in vp8_decode_mb_tokens() is a big win on ARM (presumably due to memory latency and code size: refilling after normalization duplicates the code in the DECODE_AND_BRANCH_IF_ZERO and DECODE_AND_LOOP_IF_ZERO cases. Unfortunately, refilling at the end of vp8dx_bool_decoder_fill() and at the beginning of each decode step in vp8_decode_mb_tokens() means the latter requires an extra refill at the end. Platform-specific versions could avoid the problem, but would require most of detokenize.c to be duplicated. Change-Id: I16c782a63376f2a15b78f8086d899b987204c1c7	2010-06-15 19:55:14 -07:00
Paul Wilkins	7a81b29d38	Use local pointer to pbi->common.	2010-06-11 15:17:57 +01:00
John Koleszar	fb220d257b	replace while(0) construct with if/else No good reason to be tricky here. I don't know why 'break' occurred to me as the natrual replacement for the 'return', but an if/else block is definitely clearer. Change-Id: I08a336307afeb0dc7efa494b37398f239f66c2cf	2010-06-10 20:15:21 -04:00
Timothy B. Terriberry	05c6eca4db	Fix new MV clamping scheme for chroma MVs. The new scheme introduced in I68d35a2f did not clamp chroma MVs in the SPLITMV case, and clamped them incorrectly (to the luma plane bounds) in every other case. Because chroma MVs are computed from the luma MVs before clamping occurs, they could still point outside of the frame buffer and cause crashes. This clamping happens outside of the MV prediction loop, and so should not affect bitstream decoding.	2010-06-10 18:42:24 -04:00
John Koleszar	3085025fa1	Remove secondary mv clamping from decode stage This patch removes the secondary MV clamping from the MV decoder. This behavior was consistent with limits placed on non-split MVs by the reference encoder, but was inconsistent with the MVs generated in the split case. The purpose of this secondary clamping was only to prevent crashes on invalid data. It was not intended to be a behaviour an encoder could or should rely on. Instead of doing additional clamping in a way that changes the entropy context, the secondary clamp is removed and the border handling is made implmentation specific. With respect to the spec, the border is treated as essentially infinite, limited only by the clamping performed on the near/nearest reference and the maximum encodable magnitude of the residual MV. This does not affect any currently produced streams. Change-Id: I68d35a2fbb51570d6569eab4ad233961405230a3	2010-06-09 11:47:24 -04:00
Philip Jägenstedt	0dd78af3e9	remove unreferenced variable i	2010-06-07 11:35:33 -04:00
John Koleszar	09202d8071	LICENSE: update with latest text Change-Id: Ieebea089095d9073b3a94932791099f614ce120c	2010-06-04 16:19:40 -04:00
Yunqing Wang	d33bf3d664	Remove costly memory reads/writes in vp8_reset_mb_tokens_context() Tests on x86 showed this function costed 2.7% of total decoding time because of all the memory reads/writes. After modification, it only costs about 0.7% of decoding time, which gives a 2% gain. Change-Id: I5003ee30b6dc6dea0bfa42a6ad7e7c22fcc7b215	2010-06-01 07:59:50 -04:00
John Koleszar	b7492341ac	install includes in DIST_DIR/include/vpx, move vpx_codec/ to vpx/ This renames the vpx_codec/ directory to vpx/, to allow applications to more consistently reference these includes with the vpx/ prefix. This allows the includes to be installed in /usr/local/include/vpx rather than polluting the system includes directory with an excessive number of includes. Change-Id: I7b0652a20543d93f38f421c60b0bbccde4d61b4f	2010-05-24 20:27:42 -04:00
John Koleszar	0ea50ce9cb	Initial WebM release	2010-05-18 11:58:33 -04:00

... 2 3 4 5 6

291 Commits