generic-library/vpx

Author	SHA1	Message	Date
Johann	f143a81191	Merge "Fix valgrind errors in the NEON loop filters."	2010-10-01 06:18:53 -07:00
Timothy B. Terriberry	a465076e02	Fix valgrind errors in the NEON loop filters. Like the ARMv6 code, these functions were accessing values below the stack pointer, which can be corrupted by signal delivery at any time.	2010-09-30 20:40:45 -07:00
John Koleszar	a047fee606	Merge "Fix loopfilter delta zero transitions"	2010-09-30 10:26:10 -07:00
Fritz Koenig	439b2ecd74	Merge "Optimizations on the loopfilters."	2010-09-29 10:47:01 -07:00
John Koleszar	b9be7a464f	Fix loopfilter delta zero transitions Loopfilter deltas are initialized to zero on keyframes in the decoder. The values then persist from the previous frame unless an update bit is set in the bitstream. This data is not included in the entropy data saved by the 'refresh entropy' bit in the bitstream, so it is effectively an additional contextual element beyond the 3 ref-frames and the entropy data. The encoder was treating this delta update bit as update-if-nonzero, meaning that the value would be refreshed even if it hadn't changed, and more significantly, if the correct value for the delta changed to zero, the update wouldn't be sent, and the decoder would preserve the last (presumably non-zero) value. This patch updates the encoder to send an update only if the value has changed from the previously transmitted value. It also forces the value to be transmitted in error resilient mode, to account for lost context in the event of lost frames. Change-Id: I56671d5b42965d0166ac226765dbfce3e5301868	2010-09-29 13:04:04 -04:00
Fritz Koenig	0964ef0e71	Optimizations on the loopfilters. - Scheduling for Atom processors - Combining of macros to allow for better interleaving - Change from multiplies to adds for main filter - Use of movhps/movlps to fill xmm registers without shifting and orring Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b	2010-09-28 12:01:34 -07:00
Timothy B. Terriberry	18dc92fd66	Add 4-tap version of 2nd-pass ARMv6 MC filter. The existing code applied a 6-tap filter with 0's on either end. We're already paying the branch penalty to avoid computing the two extra columns needed as input to this filter. We might as well save time computing the filter as well. This reduces the inner loop from 21 instructions to 16, the number of loads per iteration from 4 to 1, and the number of multiplies from 7 to 4. The gain in overall decoding performance, however, is small (less than 1%). This change also means we now valgrind clean on ARMv6, which is its real purpose. The errors reported here were valgrind's fault (it does not detect that 0 times an uninitialized value is initialized), but Julian Seward says it would slow down valgrind considerably to make such checks. Speeding up libvpx rather, even by a small amount, seems a much better idea if only to enable proper valgrind checking of the rest of the codec. Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16	2010-09-27 18:25:45 -07:00
John Koleszar	2b521ab551	move reconintra_mt to decoder (fixup) Missed the .h file in the move. Change-Id: Ib408183fbb4d019fd46394b362f89ca6ea9d10bc	2010-09-27 12:48:31 -04:00
Johann	063be9b82a	Merge "combine max values and compare once"	2010-09-27 06:39:20 -07:00
Timothy B. Terriberry	e2795e9978	Fix valgrind errors in vp8_sixtap_predict8x4_armv6(). This function was accessing values below the stack pointer, which can be corrupted by signal delivery at any time. Change-Id: I92945b30817562eb0340f289e74c108da72aeaca	2010-09-24 14:34:18 -07:00
Johann	f30e8dd7bd	combine max values and compare once previous implementation compared each set of values to limit and then &'d them together, requiring a compare and & for each value. this does the accumulation first, requiring only one compare Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323	2010-09-24 15:42:50 -04:00
John Koleszar	48e76ff4fd	move reconintra_mt to decoder (for now) reconintra_mt.c is only required for building the decoder right now. It could definitely be used for the encoder in the future, but it currently depends on decoder only data structures. (onyxd_int.h, VP8D_COMP, etc). Move it from common/ to decoder/ until the necessary changes to the common multithread code are complete. This patch is needed to build with --disable-vp8-decoder. Change-Id: I568c52221a2b309234d269675cba97131ce35c86	2010-09-24 11:23:06 -04:00
Johann	7fed3832e7	Remove dead code The new loopfilter was originally introduced as an experimental change. It's permanent now. Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546	2010-09-22 11:07:34 -04:00
Yunqing Wang	a23ccf8f8c	Merge "Restructure multi-threaded decoder"	2010-09-21 05:00:30 -07:00
Fritz Koenig	b7dc9398f2	Use movq instead of movdqu. Movdqu is more expensive (throughput, uops) than movq. Minimal impact for newer big cores, but ~2.25% gain on Atom. Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f	2010-09-20 11:34:26 -07:00
Fritz Koenig	8eae7fe7e8	Better choice of instruction filter mask comparision. Use pmaxub instead of a combination of psubusb/por to determine if any comparisons go over the limit. Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82	2010-09-20 10:20:38 -07:00
Yunqing Wang	f857a85088	Restructure multi-threaded decoder On each MB, loopfiltering is done right after MB decoding. This combines two loops in multi-threaded code into one, which reduces number of synchronizations to half. The above-row/left-col data are saved in temp buffers for next-row/next MB decoding. Tests on 4-core gLucid machine showed 10% decoder performance gain with threads=4 (tulip clip). Testing on other platforms isn't done yet. Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9	2010-09-17 09:56:05 -04:00
Fritz Koenig	769f2424cc	Removed unnecessary pxor. There is no need to make sure that the lower byte of the register is 0 because the downshift by 11 overwrites that byte. Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1	2010-09-13 18:34:34 -07:00
Fritz Koenig	a65cd3def0	Make block access to frame buffer sequential Sequentially accessing memory from a low address to a high address should make it easier for the processor to predict the cache. Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d	2010-09-10 16:27:28 -07:00
Fritz Koenig	6d90f867e4	Merge branch 'master' of git://review.webmproject.org/libvpx	2010-09-09 08:54:21 -07:00
John Koleszar	c2140b8af1	Use WebM in copyright notice for consistency Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba	2010-09-09 10:01:21 -04:00
Fritz Koenig	3fb37162a8	Bilinear subpixel optimizations for ssse3. Used pmaddubsw for multiply and add of two filter taps at once for 16x16 and 8x8 blocks. Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea	2010-09-07 17:19:40 -07:00
Scott LaVarnway	0de458f6b9	Reduced the size of MB_MODE_INFO Moved partition_bmi and partition_count out of MB_MODE_INFO and placed into MACROBLOCK. Also reduced the size of other members of the MB_MODE_INFO struct. For 1080p, the memory was reduced by 1,209,516 bytes. The decoder performance appeared to improve by 3% for the clip used. Note: The main goal for this change is to improve the decoder performance. The encoder will be revisited at a later date for further structure cleanup. Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613	2010-09-03 16:43:23 -04:00
James Zern	76640f85da	encoder: remove postproc dependency Remove the dependency on postproc.c for the encoder in general, the only unchecked need for it is when CONFIG_PSNR is enabled. All other cases are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file will still be included. Additionally, when VP8_SET_POSTPROC is used with the encoder when post processing has been disabled an error will be returned. This addresses issue #153. Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089	2010-09-02 11:52:37 -04:00
Yunqing Wang	0e78efad0b	Replace sleep(0) calls in multi-threaded decoder This is a workaround for gLucid problem. Change-Id: I188a016a07e4c2ea212444c5a6284ff3c48a5caa	2010-08-31 20:37:11 -04:00
Johann	0b94f5d6e8	followup arm patch make the arm asm detokenizer work with the new structures Change-Id: I7cd92c2a018ec24032bb1cfd1bb9739bc84b444a	2010-08-31 11:41:10 -04:00
Scott LaVarnway	e85e631504	Changed above and left context data layout The main reason for the change was to reduce cycles in the token decoder. (~1.5% gain for 32 bit) This layout should be more cache friendly. As a result of this change, the encoder had to be updated. Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837 Note: dixie uses a similar layout	2010-08-31 11:24:30 -04:00
Timothy B. Terriberry	7a8e0a2935	Fix harmless off-by-1 error. The memory being zeroed in vp8_update_mode_info_border() was just allocated with calloc, and so the entire function is actually redundant, but it should be made correct in case someone expects it to actually work in the future. Change-Id: If7a84e489157ab34ab77ec6e2fe034fb71cf8c79	2010-08-27 16:07:54 -07:00
Fritz Koenig	93c32a55c2	Rework idct calling structure. Moving the eob structure allows for a non-struct based function to handle decoding an entire mb of idct/dequant/recon data. This allows for SIMD functions to idct/dequant/recon multiple blocks at once. SSE2 implementation gives 3% gain on Atom. Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2	2010-08-23 08:58:54 -07:00
Jim Bankoski	b0660457fe	Revert "Removed ssse3 sixtap code" This reverts commit `6ea5bb85cd`.	2010-08-19 15:58:27 -04:00
Johann	52852da7c9	cleanup simple loop filter move some things around, reorder some instructions constant 0 is used several times. load it once per call in horiz, once per loop in vert. separate saturating instructions to avoid stalls. just use one usub8 call to set GE flags, rather than uqsub8 followed by usub8 w/ 0 document some stalls for further consideration Change-Id: Ic3877e0ddbe314bb8a17fd5db73501a7d64570ec	2010-08-19 13:37:40 -04:00
Johann	a522be2941	Merge "fix armv6 simpleloop filter"	2010-08-19 08:31:57 -07:00
Johann	467a0b99ab	fix armv6 simpleloop filter test cases were causing a crash because the count was being read incorrectly. after fixing that, noticed that the output was not matching. fixed that. Change-Id: Idb0edb887736bd566a3cf6d4aa1a03ea8d20eb27	2010-08-19 11:29:21 -04:00
Scott LaVarnway	6ea5bb85cd	Removed ssse3 sixtap code Change-Id: I0f20fbb898ee31eb94a143471aa6f1ca17a229a4	2010-08-18 15:34:09 -04:00
Johann	c75f3993c0	store more vars than we removed only saved r4-11+lr, but were storing r4-r12+lr Change-Id: If77df1998af50e9badee7d99ef53543046434675	2010-08-16 10:32:15 -04:00
John Koleszar	80d3923a78	move segmentation_common to encoder vp8_update_gf_useage_maps() is only used by the encoder. This patch fixes the ability to build in decode-only or encode-only configurations. Change-Id: I3a5211428e539886ba998e09e8abd747ac55c9aa	2010-08-13 14:54:24 -04:00
Johann	633646b73b	update structure mode_info_context->mbmi no longer gets copied up a level Change-Id: Icd2d27d381909721326c34594a1ccdc26d48a995	2010-08-12 16:37:55 -04:00
Johann	1ec7981c34	remove unused definition asm_offsets contains some definitions which are no longer used. this was one of them. v6 build works now Change-Id: If370cfa8acd145de4fead2d9a11b048fccc090df	2010-08-12 16:37:55 -04:00
Scott LaVarnway	9c7a0090e0	Removed unnecessary MB_MODE_INFO copies These copies occurred for each macroblock in the encoder and decoder. Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD. As a result, a large number compile errors had to be fixed. Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3	2010-08-12 16:25:43 -04:00
Scott LaVarnway	f5615b6149	Merge "Finished vp8_sixtap_predict4x4_ssse3 function"	2010-08-11 12:23:24 -07:00
John Koleszar	392a958274	avoid negative array subscript warnings The mv_ref and sub_mv_ref token encodings are indexed from NEARESTMV and LEFT4X4, respectively, rather than being zero-based like the other token encodings. Change-Id: I3699c3f84111209ecfb91097c4b900773e9a3ad5	2010-08-11 13:49:12 -04:00
Scott LaVarnway	b07e5b6fa1	Finished vp8_sixtap_predict4x4_ssse3 function Added vp8_filter_block1d4_h6_ssse3 and vp8_filter_block1d4_v6_ssse3 assembly routines. Also removed unused assembly. Change-Id: I01c1021835f2edda9da706822345f217087ca0d0	2010-08-11 13:49:00 -04:00
Johann	c0ba42d3c0	rename DETOK_[AL] everything else uses lowercase detok Change-Id: I9671e2e90eb2961208dfa81c00b3accb5749ec04	2010-08-11 13:36:35 -04:00
Scott LaVarnway	99f46d62d9	Moved gf_active code to encoder only The gf_active code is only used by the encoder, so it was moved from common and decoder. Change-Id: Iada15acd5b2b33ff70c34668ca87d4cfd0d05025	2010-08-11 11:54:25 -04:00
Scott LaVarnway	e4fe866949	Added ssse3 version of sixtap filters Improved decoder performance by 9% for the clip used. Change-Id: I8fc5609213b7bef10248372595dc85b29f9895b9	2010-08-10 17:33:49 -04:00
John Koleszar	618c7d27a0	Mark loopfilter C functions as static Clang defaults to C99 mode, and inline works differently in C99. (gcc, on the other hand, defaults to a special gnu-style inlining, which uses different syntax.) Making the functions static makes sure clang doesn't decide to discard a function because it's too large to inline. Thanks to eli.friedman for the patch. Fixes http://code.google.com/p/webm/issues/detail?id=114 Change-Id: If3c1c3c176eb855a584a60007237283b0cc631a4	2010-08-09 09:36:44 -04:00
John Koleszar	cfb204eaf7	Merge "Issue 150: Fixing linker warning in extend.c."	2010-08-02 09:35:05 -07:00
Jan Kratochvil	0e8f108fb0	nasm: avoid space before the :data symbol type. global label:data ^^ Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I10f17eb1e4d4a718d4ebd1d0ccddc807c365e021	2010-08-02 09:20:42 -04:00
Frank Galligan	062e6c1886	Removed two unused global variables. Removed the global variables vp8_an and vp8_cd. vp8_an was causing problems because it was increasing the .bss by 1572864 bytes. Change-Id: I6c12e294133c7fb6e770c0e4536d8287a5720a87	2010-07-28 17:25:09 -04:00
Johann	56f5a9a060	update arm idct functions Jeff Muizelaar posted some changes to the idct/reconstruction c code. This is the equivalent update for the arm assembly. This shows a good boost on v6, and a minor boost on neon. Here are some numbers for highway in qcif, 2641 frames: HEAD neon: ~161 fps new neon: ~162 fps HEAD v6: ~102 fps new v6: ~106 fps The following functions have been updated for armv6 and neon: vp8_dc_only_idct_add vp8_dequant_idct_add vp8_dequant_dc_idct_add Conflicts: vp8/decoder/arm/armv6/dequantdcidct_v6.asm vp8/decoder/arm/armv6/dequantidct_v6.asm Resolved by removing these files. When I rewrote the functions, I also moved the files to dequant_dc_idct_v6.asm/dequant_idct_v6.asm Change-Id: Ie3300df824d52474eca1a5134cf22d8b7809a5d4	2010-07-26 08:55:19 -04:00
Justin Lebar	1d8277f8e8	Issue 150: Fixing linker warning in extend.c.	2010-07-23 16:42:25 -07:00
Jeff Muizelaar	98fcccfe97	Change the x86 idct functions to do reconstruction at the same time Change-Id: I896fe6f9664e6849c7cee2cc6bb4e045eb42540f	2010-07-23 15:21:36 -04:00
Jeff Muizelaar	b2fa74ac18	Combine idct and reconstruction steps This moves the prediction step before the idct and combines the idct and reconstruction steps into a single step. Combining them seems to give an overall decoder performance improvement of about 1%. Change-Id: I90d8b167ec70d79c7ba2ee484106a78b3d16e318	2010-07-23 15:21:36 -04:00
Fritz Koenig	0ce3901282	Swap alt/gold/new/last frame buffer ptrs instead of copying. At the end of the decode, frame buffers were being copied. The frames are not updated after the copy, they are just for reference on later frames. This change allows multiple references to the same frame buffer instead of copying it. Changes needed to be made to the encoder to handle this. The encoder is still doing frame buffer copies in similar places where pointer reference could be done. Change-Id: I7c38be4d23979cc49b5f17241ca3a78703803e66	2010-07-23 14:53:59 -04:00
Michael Kohler	1e23f45119	Fix misspelled "skiped" in onyxc_int.h to "skipped". Signed-off-by: Michael Kohler <michaelkohler@live.com>	2010-07-07 20:06:04 +02:00
Yunqing Wang	29d586b462	Add loopfilter initialization fix in multithreading code Modified loopfilter initialization to avoid unnecessary operations. Change-Id: I9fd1a5a49edc1cb8116c2a72a6908b1e437459ec	2010-06-30 09:42:39 -04:00
Yunqing Wang	bead039d4d	Improve SSE2 loopfilter functions Restructured and rewrote SSE2 loopfilter functions. Combined u and v into one function to take advantage of SSE2 128-bit registers. Tests on test clips showed a 4% decoder performance improvement on Linux desktop. Change-Id: Iccc6669f09e17f2224da715f7547d6f93b0a4987	2010-06-29 15:23:14 -04:00
Timothy B. Terriberry	9f81463454	Fix a linker error on x86-64 Linux when not using a version script. If the version script produced by the libvpx build system is not used when linking a shared library on x86-64 Linux, the constant data in the subpel filters produces R_X86_64_32 relocation errors due to the use of wrt rip addressing instead of wrt rip wrt ..gotpcrel. Instead of adding a new macro for this addressing mode, this patch sets the ELF visibility of these symbols to "hidden", which allows wrt rip addressing to work without a text relocation. This allows building a shared library without using the provided build system or a separate version script. Fixes http://code.google.com/p/webm/issues/detail?id=46 Change-Id: Ie108f9d9a4352e5af46938bf4750d2302c1b2dc2	2010-06-21 08:19:12 -04:00
John Koleszar	94c52e4da8	cosmetics: trim trailing whitespace When the license headers were updated, they accidentally contained trailing whitespace, so unfortunately we have to touch all the files again. Change-Id: I236c05fade06589e417179c0444cb39b09e4200d	2010-06-18 13:06:11 -04:00
John Koleszar	c65e8e8e46	Merge "Change bitreader to use a larger window."	2010-06-17 18:08:36 -07:00
Timothy B. Terriberry	c17b62e1bd	Change bitreader to use a larger window. Change bitreading functions to use a larger window which is refilled less often. This makes it cheap enough to do bounds checking each time the window is refilled, which avoids the need to copy the input into a large circular buffer. This uses less memory and speeds up the total decode time by 1.6% on an ARM11, 2.8% on a Cortex A8, and 2.2% on x86-32, but less than 1% on x86-64. Inlining vp8dx_bool_decoder_fill() has a big penalty on x86-32, as does moving the refill loop to the front of vp8dx_decode_bool(). However, having the refill loop between computation of the split values and the branch in vp8_decode_mb_tokens() is a big win on ARM (presumably due to memory latency and code size: refilling after normalization duplicates the code in the DECODE_AND_BRANCH_IF_ZERO and DECODE_AND_LOOP_IF_ZERO cases. Unfortunately, refilling at the end of vp8dx_bool_decoder_fill() and at the beginning of each decode step in vp8_decode_mb_tokens() means the latter requires an extra refill at the end. Platform-specific versions could avoid the problem, but would require most of detokenize.c to be duplicated. Change-Id: I16c782a63376f2a15b78f8086d899b987204c1c7	2010-06-15 19:55:14 -07:00
Yunqing Wang	397aad3ec2	More on "some XMM registers are non-volatile on windows x64 ABI" Add same fix in subpixel_sse2.asm. Change-Id: Icfda6103cbf74ec43308e96961dd738aa823c14d	2010-06-15 09:11:26 -04:00
Makoto Kato	63ea8705eb	some XMM registers are non-volatile on windows x64 ABI XMM6 to XMM15 are non-volatile on Windows x64 ABI. We have to save these registers. Change-Id: I4676309f1350af25c8a35f0c81b1f0499ab99076	2010-06-11 12:11:15 -04:00
Yunqing Wang	8389f1967c	Merge "Improve vp8_sixtap_predict functions"	2010-06-11 06:48:52 -07:00
Yunqing Wang	8873a93811	Improve vp8_sixtap_predict functions Restructure vp8_sixtap_predict functions to eliminate extra 5-line calculation while doing first-pass only. Also, combline functions to eliminate usage of intermediate buffer. This gives decoder a 3% performance gain on my test clips. Change-Id: I13de49638884d1a57d0855c63aea719316d08c1b	2010-06-10 11:48:48 -04:00
John Koleszar	3085025fa1	Remove secondary mv clamping from decode stage This patch removes the secondary MV clamping from the MV decoder. This behavior was consistent with limits placed on non-split MVs by the reference encoder, but was inconsistent with the MVs generated in the split case. The purpose of this secondary clamping was only to prevent crashes on invalid data. It was not intended to be a behaviour an encoder could or should rely on. Instead of doing additional clamping in a way that changes the entropy context, the secondary clamp is removed and the border handling is made implmentation specific. With respect to the spec, the border is treated as essentially infinite, limited only by the clamping performed on the near/nearest reference and the maximum encodable magnitude of the residual MV. This does not affect any currently produced streams. Change-Id: I68d35a2fbb51570d6569eab4ad233961405230a3	2010-06-09 11:47:24 -04:00
John Koleszar	09202d8071	LICENSE: update with latest text Change-Id: Ieebea089095d9073b3a94932791099f614ce120c	2010-06-04 16:19:40 -04:00
Yaowu Xu	cbf12db901	Merge "Remove un-necessary memory initialization"	2010-06-01 19:20:37 -07:00
Yaowu Xu	66f9864a38	Remove un-necessary memory initialization The intra prediction needs one line above at the top edge.	2010-05-29 22:59:31 -07:00
Luca Barbato	e7876abb2c	expose vp8_deblock it is used by vp8/encoder/onyx_if.c fixes: vp8/encoder/onyx_if.c:5189: warning: implicit declaration of function ‘vp8_deblock’	2010-05-28 10:37:43 +02:00
John Koleszar	b7492341ac	install includes in DIST_DIR/include/vpx, move vpx_codec/ to vpx/ This renames the vpx_codec/ directory to vpx/, to allow applications to more consistently reference these includes with the vpx/ prefix. This allows the includes to be installed in /usr/local/include/vpx rather than polluting the system includes directory with an excessive number of includes. Change-Id: I7b0652a20543d93f38f421c60b0bbccde4d61b4f	2010-05-24 20:27:42 -04:00
John Koleszar	1df0314e7b	configure: remove HAVE_CONFIG_H This doesn't play well with autotools, and the preprocessor magic is confusing and unhelpful in the vp8-only context. Change-Id: I2fcb57e6eb7876ecb58509da608dc21f26077ff1	2010-05-21 05:53:48 -04:00
John Koleszar	0ea50ce9cb	Initial WebM release	2010-05-18 11:58:33 -04:00

... 6 7 8 9 10

473 Commits