generic-library/vpx

Author	SHA1	Message	Date
John Koleszar	f103dcefaf	RTCD: add subpixel functions This commit continues the process of converting to the new RTCD system. Change-Id: I6c519ab61e4f4e0ebcc796f2df061f945c48cefe	2012-01-30 12:08:29 -08:00
John Koleszar	2a8f57f50d	RTCD: add postproc functions This commit continues the process of converting to the new RTCD system. Change-Id: If54eb5cb5d1b0cac6c4c0633a9e99c93ca860ba2	2012-01-30 12:08:29 -08:00
John Koleszar	fdb61a4531	RTCD: add recon functions This commit continues the process of converting to the new RTCD system. Change-Id: I9bfcf9bef65c3d4ba0fb9a3e1532bad1463a10d6	2012-01-30 12:08:28 -08:00
John Koleszar	ab77b4e898	RTCD: add remaining IDCT functions This commit continues the process of converting to the new RTCD system. Change-Id: I03c4dbf30dfd3558b0e256ff9d3ff4c012aadc80	2012-01-30 12:08:22 -08:00
John Koleszar	55f74c59c7	RTCD: add loopfilter functions This commit continues the process of converting to the new RTCD system. Change-Id: Ic8a4047d72ff3a54ec98977dd90e70c13213db71	2012-01-30 12:06:31 -08:00
John Koleszar	a910049aea	New RTCD implementation This is a proof of concept RTCD implementation to replace the current system of nested includes, prototypes, INVOKE macros, etc. Currently only the decoder specific functions are implemented in the new system. Additional functions will be added in subsequent commits. Overview: RTCD "functions" are implemented as either a global function pointer or a macro (when only one eligible specialization available). Functions which have RTCD specializations are listed using a simple DSL identifying the function's base name, its prototype, and the architecture extensions that specializations are available for. Advantages over the old system: - No INVOKE macros. A call to an RTCD function looks like an ordinary function call. - No need to pass vtables around. - If there is only one eligible function to call, the function is called directly, rather than indirecting through a function pointer. - Supports the notion of "required" extensions, so in combination with the above, on x86_64 if the best function available is sse2 or lower it will be called directly, since all x86_64 platforms implement sse2. - Elides all references to functions which will never be called, which could reduce binary size. For example if sse2 is required and there are both mmx and sse2 implementations of a certain function, the code will have no link time references to the mmx code. - Significantly easier to add a new function, just one file to edit. Disadvantages: - Requires global writable data (though this is not a new requirement) - 1 new generated source file. Change-Id: Iae6edab65315f79c168485c96872641c5aa09d55	2012-01-30 12:06:27 -08:00
Jim Bankoski	ed208f7d2f	vp8d - valgrind warnings in mb post processor Solved by extending the border in the postproc buffer as necessary Change-Id: Ic3f61397fe5bc8e4db6fc78050b0b160bd0aee86	2012-01-17 17:27:39 -08:00
John Koleszar	66da859e5e	Merge "Reduced the size of Y1Dequant and friends to [128][2]"	2012-01-06 11:59:06 -08:00
Scott LaVarnway	5f25d4c175	Reduced the size of Y1Dequant and friends to [128][2] This patch removes the local copies of the dequantize constants and implements John's idea as described in "Make a local copy of the dequantized data" commit. Change-Id: Ic6b7d681f00bf63263f71ff1e39ab2f80729e8b2	2012-01-06 11:12:00 -08:00
Scott LaVarnway	89cdfdb231	Merge "SSE2 optimizations for vp8_build_intra_predictors_mby{,_s}()"	2012-01-05 09:05:19 -08:00
Scott LaVarnway	77119a5cd8	Merge "Improved sse2 version of simple loopfilter"	2012-01-04 13:26:13 -08:00
John Koleszar	0c2f8e77cc	Remove useless g_common.h This file declared a bunch of nonexistent, unreferenced global function pointers. Change-Id: Ic26bb8c7712deba754c49fc01f383b53afc9e728	2011-12-21 15:02:23 -08:00
Scott LaVarnway	1d7d18c69c	Improved sse2 version of simple loopfilter Change-Id: Iae406d16fab5bace47fbcf5ef7ed021f08af159d	2011-12-21 12:52:18 -05:00
Scott LaVarnway	a53d5a4c44	Moved dequant idct into common These functions are now used by the encoder. This is WIP with the goal of creating a common idct/add for the encoder and decoder. A boost of 1.8% was seen for the HD rt test clip used. [Tero] Added needed changes to ARM side. Change-Id: Ibbb8000be09034203d7adffc457d3c3f8b06a5bf	2011-12-15 14:23:41 -05:00
Scott LaVarnway	9fa6132fc5	Improved mmx/sse2 versions of iwalsh Removed unnecessary transposes. Change-Id: I029fbaf8afafee34d54a4f3333c22023c15003c3	2011-12-08 14:37:59 -05:00
Scott LaVarnway	f46e17fd6f	Merge "Modified the inverse walsh to output directly"	2011-11-28 07:26:07 -08:00
Scott LaVarnway	4a91541c94	Modified the inverse walsh to output directly to the dqcoeff or qcoeff buffer. The encoder would populate the dc coeffs of the y blocks as a separate stage (recon_dcblock) and the decoder would use a special version of the idct. This change eliminates the extra copy and reduces the code footprint. [Tero] Added needed changes to armv6 and NEON assembly. Change-Id: I83202ffdbaf83f6e5dd69f4ba2519fcf0b13b3ba	2011-11-25 09:24:04 +02:00
Johann	f2cd4ded22	Move shared data to shared location Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2 file is bad Moving towards allowing --disable-mmx Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742	2011-11-18 16:23:14 -08:00
Scott LaVarnway	df49c7c58d	SSE2 optimizations for vp8_build_intra_predictors_mby{,_s}() Ronald recently sent me this patch that he did in April. > From: Ronald S. Bultje <rbultje@google.com> > Date: Thu, 28 Apr 2011 17:30:15 -0700 > Subject: [PATCH] SSE2 optimizations for > vp8_build_intra_predictors_mby{,_s}(). HD decode tests have shown a performance boost up to 1.5%, depending on material. Patch set 3: Fixed encoder crash. Change-Id: Ie1fd1fa3dc750eec1a7a20bfa2decc079dcf48c8	2011-11-09 15:30:35 -05:00
Scott LaVarnway	ed9c66f584	Remove usage of predict buffer for decode Instead of using the predict buffer, the decoder now writes the predictor into the recon buffer. For blocks with eob=0, unnecessary idcts can be eliminated. This gave a performance boost of ~1.8% for the HD clips used. Tero: Added needed changes to ARM side and scheduled some assembly code to prevent interlocks. Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3) into this commit because of similarities in the idct functions. Patch Set 7: EC bug fix. Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6	2011-10-18 12:06:50 -04:00
Johann	3556deaca3	combine loopfilter data access The data processed by the loopfilter overlaps. At the block level, this results in some redundant transforms. Grouping the filtering allows for a single 16x16 transpose (and inversion) instead of three 16x8 transposes (and three more inversions). This implementation is x86_64 only. We retain the previous implementation for x86. Improvements are obviously material dependant, but it seems to be ~%1 in tests here. Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007	2011-09-30 07:38:35 -07:00
Attila Nagy	1a7d25a484	Replace vpx_ports/config.h with vpx_config.h Just a clean-up. Change-Id: Iea5b6dc925dcfa7db548bc1ab1a13d26ed5a2c9a	2011-09-22 13:33:54 +03:00
Fritz Koenig	112bd4e2b4	Fix naming of sse2 idct functions. Prepend idct function names with vp8_ so that under profiling they show up associated with libvpx. Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4	2011-08-24 10:25:32 -07:00
Johann	85358d04cd	Fix data accesses for simple loopfilters The data that the simple horizontal loopfilter reads is aligned, treat it accordingly. For the vertical, we only use the bottom 4 bytes, so don't read in 16 (and incur the penalty for unaligned access). This shows a small improvement on older processors which have a significant penalty for unaligned reads. postproc_mmx.c is unused Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d	2011-08-23 20:42:45 -04:00
Fritz Koenig	c5f890af2c	Use local labels for jumps/loops in x86 assembly. Prepend . to local labels in assembly code. This allows non unique labels within a file. Also makes profiling information more informative by keeping the function name with the loop name. Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f	2011-08-23 09:05:29 -07:00
Johann	01433c5043	update x86 asm for loopfilter Change-Id: I1ed739522db7c00c189851c7095c1b64ef6412ce	2011-07-08 09:23:38 -04:00
Ronald S. Bultje	c8a23ad3f4	Properly use GET_GOT/RESTORE_GOT when using GLOBAL(). This should fix binaries using PIC on x86-32. Also should fix issue 343. Change-Id: I591de3ad68c8a8bb16054bd8f987a75b4e2bad02	2011-06-30 14:04:27 -07:00
Scott LaVarnway	914f7c36d7	Merge "Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3."	2011-05-19 11:22:01 -07:00
Johann	df2023a6cb	set up Global Offset Table in recon global values were being referenced, but the GOT was not being set up. as the GOT is only required for PIC, this issue wasn't caught in the default configuration. Change-Id: I8006e53776139362a76f2c80cf9d0f8458602b2f http://code.google.com/p/webm/issues/detail?id=328	2011-05-10 15:58:56 -04:00
Johann	a7d4d3c550	clean up unused variable warnings Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b	2011-05-09 12:56:20 -04:00
Thijs Vermeir	8942f70cdf	Fix documentation typos Change-Id: I97124670926433bf1593c91660d8b8f8482ea9ce	2011-04-30 09:34:59 +02:00
Ronald S. Bultje	5a23352c03	Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3. Change-Id: I658a1df7d825f820573cb2d11ad402f9d2791035	2011-04-29 11:52:09 -07:00
James Berry	f10732554b	bug fix removed inline from recon_wrapper_sse2.c removed inline from recon_wrapper_sse2.c to build for visual stuido Change-Id: I74a3482950448e2cdb30e9cd7087145b440d8a22	2011-04-28 15:12:00 -04:00
Ronald S. Bultje	1e7ded69cf	Use psadbw to get the sum of bytes in a line. Thanks Jason for pointing that out on #vp8. ;-). Change-Id: I5330a753e752a8704b78a409597472628e0b26a5	2011-04-27 13:49:21 -07:00
Ronald S. Bultje	1083fe4999	SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}(). decoding before 10.425 10.432 10.423 =10.426 after: 10.405 10.416 10.398 =10.406, 0.2% faster encoding before 14.252 14.331 14.250 14.223 14.241 14.220 14.221 =14.248 after 14.095 14.090 14.085 14.095 14.064 14.081 14.089 =14.086, 1.1% faster Change-Id: I483d3d8f0deda8ad434cea76e16028380722aee2	2011-04-27 11:31:27 -07:00
Johann	01527e743f	remove simpler_lpf the decision to run the regular or simple loopfilter is made outside the function and managed with pointers stop tracking the option in two places. use filter_type exclusively Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15	2011-04-25 17:37:41 -04:00
Johann	4a2b684ef4	modify SAVE_XMM for potential 64bit use the win64 abi requires saving and restoring xmm6:xmm15. currently SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow specifying the highest register used and if the stack is unaligned. Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867	2011-04-19 10:42:45 -04:00
Johann	c7cfde42a9	Add save/restore xmm registers in x86 assembly code Went through the code and fixed it. Verified on Windows. Where possible, remove dependencies on xmm[67] Current code relies on pushing rbp to the stack to get 16 byte alignment. This broke when rbp wasn't pushed (vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned memory accesses. Revisit this and the offsets in vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM. Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877	2011-04-18 16:30:38 -04:00
Johann	487c0299c9	remove dead code, add missing RESTORE_XMM vp8_filter_block1d16_h4_ssse3 was never called because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was masked. however, if/when win64 used those registers for persistant data, issues could/will arise. Change-Id: I56d6effca0aeba1f86082689771cb10145d39651	2011-04-15 10:11:53 -04:00
John Koleszar	a9ce3e3834	Remove unused files Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08	2011-04-11 10:34:40 -04:00
John Koleszar	429dc676b1	Increase static linkage, remove unused functions A large number of functions were defined with external linkage, even though they were only used from within one file. This patch changes their linkage to static and removes the vp8_ prefix from their names, which should make it more obvious to the reader that the function is contained within the current translation unit. Functions that were not referenced were removed. These symbols were identified by: $ nm -A libvpx.a \| sort -k3 \| uniq -c -f2 \| grep ' [A-Z] ' \ \| sort \| grep '^ *1 ' Change-Id: I59609f58ab65312012c047036ae1e0634f795779	2011-03-17 20:53:47 -04:00
John Koleszar	02321de0f2	Fix relative include paths Allow compiling without adding vp8/{common,encoder,decoder} to the include paths. Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c	2011-02-10 15:09:44 -05:00
Gaute Strokkenes	ffc6aeef14	Remove duplicate loopfilter parameters. Change-Id: I0d41415e3961c2c9492d342290c1999f9d02e6d8	2011-02-04 14:55:02 +00:00
Timothy B. Terriberry	c4d7e5e67e	Eliminate more warnings. This eliminates a large set of warnings exposed by the Mozilla build system (Use of C++ comments in ISO C90 source, commas at the end of enum lists, a couple incomplete initializers, and signed/unsigned comparisons). It also eliminates many (but not all) of the warnings expose by newer GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite without checking the return values). There are a few spurious warnings left on my system: ../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used uninitialized in this function gcc seems to be unable to figure out that the value shortcut doesn't change between the two if blocks that test it here. ../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned expression >= 0 is always true ../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned expression >= 0 is always true This is true, so far as it goes, but it's comparing against an enum, and the C standard does not mandate that enums be unsigned, so the checks can't be removed. Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395	2010-10-27 18:08:04 -07:00
Jan Kratochvil	1fc294116a	nasm: movhps compatibility QWORD->MMWORD Filed for nasm as: https://sourceforge.net/tracker/?func=detail&atid=106208&aid=3081103&group_id=6208 nasm just does not accept any size parameter for movhps: 1.asm:2: error: mismatch in operand sizes Some parts of libvpx already use MMWORD for movhps and MMWORD is defined-out so it is compatible both with yasm and nasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Change-Id: I4008a317ca87ec07c9ada958fcdc10a0cb589bbc	2010-10-04 20:47:19 -04:00
Jan Kratochvil	5cdc3a4c29	nasm: address labels 'rel label' vice 'wrt rip' nasm does not support `label wrt rip', it requires `rel label'. It is still fully compatible with yasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50	2010-10-04 19:47:54 -04:00
Jan Kratochvil	e114f699f6	nasm: match instruction length (movd/movq) to parameters nasm requires the instruction length (movd/movq) to match to its parameters. I find it more clear to really use 64bit instructions when we use 64bit registers in the assembly. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91	2010-10-04 23:36:29 +02:00
Fritz Koenig	0964ef0e71	Optimizations on the loopfilters. - Scheduling for Atom processors - Combining of macros to allow for better interleaving - Change from multiplies to adds for main filter - Use of movhps/movlps to fill xmm registers without shifting and orring Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b	2010-09-28 12:01:34 -07:00
Fritz Koenig	b7dc9398f2	Use movq instead of movdqu. Movdqu is more expensive (throughput, uops) than movq. Minimal impact for newer big cores, but ~2.25% gain on Atom. Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f	2010-09-20 11:34:26 -07:00
Fritz Koenig	8eae7fe7e8	Better choice of instruction filter mask comparision. Use pmaxub instead of a combination of psubusb/por to determine if any comparisons go over the limit. Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82	2010-09-20 10:20:38 -07:00

1 2

71 Commits