generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	4a91541c94	Modified the inverse walsh to output directly to the dqcoeff or qcoeff buffer. The encoder would populate the dc coeffs of the y blocks as a separate stage (recon_dcblock) and the decoder would use a special version of the idct. This change eliminates the extra copy and reduces the code footprint. [Tero] Added needed changes to armv6 and NEON assembly. Change-Id: I83202ffdbaf83f6e5dd69f4ba2519fcf0b13b3ba	2011-11-25 09:24:04 +02:00
Johann	f2cd4ded22	Move shared data to shared location Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2 file is bad Moving towards allowing --disable-mmx Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742	2011-11-18 16:23:14 -08:00
John Koleszar	bdd35c13cc	avoid resetting framerate during vpx_codec_enc_config_set() The calculated frame_rate is a state variable in the codec, and shouldn't be maintained in the configuration struct. Move it to the main part of cpi so that it isn't clobbered when the configuration struct is updated. The initial framerate estimate is moved from the vp8_cx_iface.c wrapper into the body of init_config() in onyx_if.c, so that it is only called once and not reset on every call to vp8_change_config(). Change-Id: I8d9a3d1283330d1ee297d07e9d78d1f2875f2465	2011-11-11 14:45:58 -08:00
Scott LaVarnway	df49c7c58d	SSE2 optimizations for vp8_build_intra_predictors_mby{,_s}() Ronald recently sent me this patch that he did in April. > From: Ronald S. Bultje <rbultje@google.com> > Date: Thu, 28 Apr 2011 17:30:15 -0700 > Subject: [PATCH] SSE2 optimizations for > vp8_build_intra_predictors_mby{,_s}(). HD decode tests have shown a performance boost up to 1.5%, depending on material. Patch set 3: Fixed encoder crash. Change-Id: Ie1fd1fa3dc750eec1a7a20bfa2decc079dcf48c8	2011-11-09 15:30:35 -05:00
Scott LaVarnway	9532bda0fb	Merge "Relocated idct/add calls for encoder"	2011-11-09 10:17:43 -08:00
Johann	ea2229bab6	Merge "ARMv6 optimized Intra4x4 prediction"	2011-11-09 09:36:33 -08:00
John Koleszar	82e8884ad8	Merge "Remove unused file recon.c"	2011-11-09 09:31:23 -08:00
Scott LaVarnway	861ed6a5c1	Relocated idct/add calls for encoder Call the idct/add after the tokenize. This is WIP with the goal of creating a common idct/add for the encoder and decoder. This move is necessary because the decoder's version of the idct clobbers qcoeff, which is used by the tokenize. Change-Id: I6b08d8e8397cd873647fa4fb9469884e3c876756	2011-11-09 10:41:05 -05:00
Tero Rintaluoma	5a2fd63a2a	ARMv6 optimized Intra4x4 prediction Added ARM optimized intra 4x4 prediction - 2x faster on Profiler compared to C-code compiled with -O3 - Function interface changed a little to improve BLOCKD structure access Change-Id: I9bc2b723155943fe0cf03dd9ca5f1760f7a81f54	2011-11-09 09:13:51 +02:00
James Zern	9d60506130	threading: avoid defining _WIN32_WINNT The referenced function (SignalObjectAndWait) isn't used. Reduces the warnings with mingw32-w64 which defines this. Change-Id: I4ce592879ec9372bf196dac640204c4d370bd210	2011-11-08 18:50:45 -08:00
John Koleszar	f89e109f56	Remove unused file recon.c File not referenced from anywhere and no longer compiles. Change-Id: I38b11bd60db615c2c2c9d7ad35caba3a1adf1750	2011-11-08 15:54:56 -08:00
James Zern	f89ea3432f	fix file permissions all of googletest import (0ab00a22) was marked executable Change-Id: Id7b7ee03efc21ab998bb03349bd91644e8af25da	2011-11-04 18:50:35 -07:00
John Koleszar	7ca6c91732	Merge "Changing decoder input partition API to input fragments."	2011-11-04 09:36:27 -07:00
Scott LaVarnway	46639567a0	Merge "Change use of eob in the encoder"	2011-11-03 08:06:06 -07:00
Tero Rintaluoma	e4f2ec7a52	Change use of eob in the encoder Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and decoder will use eobs[25] array from MACROBLOCKD structure. In future, this will enable use of the decoder side IDCT in the encoder. Change-Id: I6e1c011628cb8864fd4a0b80f0279ce16a5ca978	2011-11-03 16:08:09 +02:00
Stefan Holmer	1427205215	Changing decoder input partition API to input fragments. Adding support for several partitions within one input fragment. This is necessary to fully support all possible packetization combinations in the VP8 RTP profile. Several partitions can be transmitted in the same packet, and they can only be split by reading the partition lengths from the bitstream. Change-Id: If7d7ea331cc78cb7efd74c4a976b720c9a655463	2011-11-01 14:44:37 -07:00
Scott LaVarnway	e0309e1509	Merge "Improved decode_split_mv()"	2011-10-28 09:27:17 -07:00
Scott LaVarnway	6064384d59	Improved decode_split_mv() Tests showed ~1.2% performance boost on the HD clip used. Performance will vary based on material. Change-Id: Icbcf1a828750d5b4ae5252bf596b3ef594042e8a	2011-10-27 11:26:30 -04:00
Scott LaVarnway	0db5599957	Merge "Improved mv_bias"	2011-10-27 06:14:00 -07:00
Scott LaVarnway	21970d1dc2	Improved mv_bias Small performance gains. Change-Id: I709b9390a8a27a70f5f23574313b8db85ac7f23d	2011-10-26 11:46:10 -04:00
Attila Nagy	de82809444	Reduce partial frame copy in encoder's pick_filter_level_fast The partial frame copy function used to copy an extra 8 lines above and below. The partial frame filtering can only modify 3 pixel rows above the partial frame. Reduce copy to bare minimum needed, which is 4 lines, so that partial filtering on copied frame is possible. Define the "magic" fraction number for partial filtering in loopfilter.h . Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec	2011-10-26 15:25:07 +03:00
James Berry	bc7151131d	Fix: check cx_data buffer prior to write check to make sure that cx_data buffer has enough room before writting to it, prior behavior did not which could result in a crash. Change-Id: I3fab6f2bc4a96d7c675ea81acd39ece121738b28	2011-10-20 15:55:00 -04:00
Scott LaVarnway	ed9c66f584	Remove usage of predict buffer for decode Instead of using the predict buffer, the decoder now writes the predictor into the recon buffer. For blocks with eob=0, unnecessary idcts can be eliminated. This gave a performance boost of ~1.8% for the HD clips used. Tero: Added needed changes to ARM side and scheduled some assembly code to prevent interlocks. Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3) into this commit because of similarities in the idct functions. Patch Set 7: EC bug fix. Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6	2011-10-18 12:06:50 -04:00
Adrian Grange	04182a121a	Merge "Added rate-targeted temporal scalability"	2011-10-11 12:54:52 -07:00
Adrian Grange	217591fde5	Added rate-targeted temporal scalability Added the ability to create rate-targeted, temporally scalable, VP8 compatible bitstreams. The application vp8_scalable_patterns.c demonstrates how to use this capability. Users can create output bitstreams containing upto 5 temporally separable streams encoded as a single VP8 bitstream. (previously abandoned as: I92d1483e887adb274d07ce9e567e4d0314881b0a) Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a	2011-10-11 12:49:12 -07:00
James Berry	05bde9d4a4	bug fix - starting/optimal/max and buffer_level changed from int to int64_t buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level and maximum_buffer_size in VP8_CONFIG changed from int to int64_t to avoid potential crash issues for larger target bit rates. Change-Id: I0d5ab6c8a44c2fef51f30cd8df4bb4b739c5df26	2011-10-10 12:16:55 -04:00
Scott LaVarnway	af12c23e8e	Merge "Improved tokenize"	2011-10-04 09:57:42 -07:00
Johann	2aa408524c	Merge "Reduce computational complexity of generic C loop filter."	2011-09-30 16:17:56 -07:00
Scott LaVarnway	ab00d209bc	Improved tokenize For a realtime HD encodings, up to 1.6% gains seen. Change-Id: If45028e23db95124da63f9d38ffe06e05596cc6e	2011-09-30 12:49:46 -04:00
Johann	3556deaca3	combine loopfilter data access The data processed by the loopfilter overlaps. At the block level, this results in some redundant transforms. Grouping the filtering allows for a single 16x16 transpose (and inversion) instead of three 16x8 transposes (and three more inversions). This implementation is x86_64 only. We retain the previous implementation for x86. Improvements are obviously material dependant, but it seems to be ~%1 in tests here. Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007	2011-09-30 07:38:35 -07:00
Aaron Watry	69aa303d96	Reduce computational complexity of generic C loop filter. Change-Id: I1e7f9ed3cd907844a495b9e0073bc140b87e5c06	2011-09-29 17:25:48 -05:00
John Koleszar	6f9457ec12	Merge "clamp_mvs() using the wrong motion vector information"	2011-09-22 11:54:15 -07:00
Attila Nagy	1a7d25a484	Replace vpx_ports/config.h with vpx_config.h Just a clean-up. Change-Id: Iea5b6dc925dcfa7db548bc1ab1a13d26ed5a2c9a	2011-09-22 13:33:54 +03:00
Scott LaVarnway	c0ee870b0a	clamp_mvs() using the wrong motion vector information In the "Removed bmi copy to/from BLOCKD" commit, the copy to the bmi in BLOCKD was eliminated. The clamp_mvs() used the bmi in BLOCKD, which now contains incorrect values. This patch fixes this problem. Change-Id: I8eca1eaf4015052b0b63e90876f7ad321aba7cff	2011-09-16 11:03:53 -04:00
Scott LaVarnway	222c72e50f	Merge "Removed bmi copy to/from BLOCKD"	2011-08-31 06:57:20 -07:00
Scott LaVarnway	b870947d42	Removed bmi copy to/from BLOCKD for SPLITMV and B_PRED modes. Modified code to use the bmi found in mode_info_context instead of BLOCKD. On the decode side, the uvmvs are calculated only when required, instead of every macroblock. This is WIP. (bmi should eventually be removed from BLOCKD) Small performance gains noticed for RT encodes and decodes.(VGA) Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7	2011-08-24 14:42:26 -04:00
Fritz Koenig	112bd4e2b4	Fix naming of sse2 idct functions. Prepend idct function names with vp8_ so that under profiling they show up associated with libvpx. Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4	2011-08-24 10:25:32 -07:00
Scott LaVarnway	1de5da80c9	Merge "Faster vp8_default_coef_probs"	2011-08-24 07:52:10 -07:00
Johann	85358d04cd	Fix data accesses for simple loopfilters The data that the simple horizontal loopfilter reads is aligned, treat it accordingly. For the vertical, we only use the bottom 4 bytes, so don't read in 16 (and incur the penalty for unaligned access). This shows a small improvement on older processors which have a significant penalty for unaligned reads. postproc_mmx.c is unused Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d	2011-08-23 20:42:45 -04:00
Fritz Koenig	c5f890af2c	Use local labels for jumps/loops in x86 assembly. Prepend . to local labels in assembly code. This allows non unique labels within a file. Also makes profiling information more informative by keeping the function name with the loop name. Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f	2011-08-23 09:05:29 -07:00
John Koleszar	edec5eb5e7	Merge "Copy less when active map is in use"	2011-08-19 07:31:00 -07:00
Alpha Lam	4e8d35a461	Copy less when active map is in use When active map is specified and the current frame is not a key frame, golden frame nor a altref frame then copy only those active regions. This significantly reduces encoding time by as much as 19% on the test system where realtime encoding is used. This is particularly useful when the frame size is large (e.g. 2560x1600) and there's only a few action macroblocks. Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4	2011-08-19 10:29:41 -04:00
Scott LaVarnway	19987dcbfa	Faster vp8_default_coef_probs Copies from a generated table instead of building the default coeff probabilities during runtime. Change-Id: I4d9551ea3a2d7d4a4f7ce9eda006495221a8de50	2011-08-16 16:21:21 -04:00
John Koleszar	e96131705a	Revert "Improved 1-pass CBR rate control" This reverts commit `b5ea2fbc2c`. Further testing showed noticable keyframe popping in some cases, reverting this for now to give time for a proper fix. Conflicts: vp8/encoder/onyx_if.c vp8/encoder/ratectrl.c Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af	2011-08-12 14:51:36 -04:00
Johann	30e5deae5d	update extend frame borders the neon code made several assumptions which were broken by a recent change: https://review.webmproject.org/2676 update the code with new assumptions and guard them with a compile time assert Change-Id: I32a8378030759966068f34618d7b4b1b02e101a0	2011-08-02 19:26:46 -04:00
John Koleszar	06c3d5bb9a	Fix building with --disable-postproc Change-Id: I7e6bc28e7974a376da747300744e0dd5dc1d21e9	2011-08-01 17:50:23 -04:00
James Zern	b45065d38b	cosmetics: consistently use [u]int64_t Removes mixed usage of (unsigned) long long and INT64. Fixes Issue #208. Change-Id: I220d3ed5ce4bb1280cd38bb3715f208ce23cf83a	2011-07-26 11:34:36 -07:00
Yunqing Wang	65dfcf4696	Use CONFIG_FAST_UNALIGNED consistently in codec CONFIG_FAST_UNALIGNED is enabled by default. Disable it if it is not supported by hardware. Change-Id: I7d6905ed79fed918bca074bd62820b0c929d81ab	2011-07-25 10:11:24 -04:00
Johann	773bcc300d	Merge "fix sharpness bug and clean up"	2011-07-22 09:34:55 -07:00
Johann	a04ed0e8f3	fix sharpness bug and clean up sharpness was not recalculated in vp8cx_pick_filter_level_fast remove last_filter_type. all values are calculated, don't need to update the lfi data when it changes. always use cm->sharpness_level. the extra indirection was annoying. don't track last frame_type or sharpness_level manually. frame type only matters for motion search and sharpness_level is taken care of in frame_init move function declarations to their proper header Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db	2011-07-22 12:33:57 -04:00
Yunqing Wang	829179e888	Merge "Preload reference area to an intermediate buffer in sub-pixel motion search"	2011-07-22 06:56:15 -07:00
Yunqing Wang	20bd1446c0	Preload reference area to an intermediate buffer in sub-pixel motion search In sub-pixel motion search, the search range is small(+/- 3 pixels). Preload whole search area from reference buffer into a 32-byte aligned buffer. Then in search, load reference data from this buffer instead. This keeps data in cache, and reduces the crossing cache- line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux) showed encoder speed improvement: 3.4% at --rt --cpu-used =-4 2.8% at --rt --cpu-used =-3 2.3% at --rt --cpu-used =-2 2.2% at --rt --cpu-used =-1 Test on Atom notebook showed only 1.1% speed improvement(speed=-4). Test on Xeon machine also showed less improvement, since unaligned data access latency is greatly reduced in newer cores. Next, I will apply similar idea to other 2 sub-pixel search functions for encoding speed > 4. Make this change exclusively for x86 platforms. Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f	2011-07-22 09:28:06 -04:00
Scott LaVarnway	a25f6a9c88	Moved vp8_encode_bool into boolhuff.h allowing the compiler to inline this function. For real-time encodes, this gave a boost of 1% to 2.5%, depending on the speed setting. Change-Id: I3929d176cca086b4261267b848419d5bcff21c02	2011-07-19 09:17:25 -04:00
John Koleszar	b5ea2fbc2c	Improved 1-pass CBR rate control This patch attempts to improve the handling of CBR streams with respect to the short term buffering requirements. The "buffer level" is changed to be an average over the rc buffer, rather than a long running average. Overshoot is also tracked over the same interval and the golden frame targets suppressed accordingly to correct for overly aggressive boosting. Testing shows that this is fairly consistently positive in one metric or another -- some clips that show significant decreases in quality have better buffering characteristics, others show improvenents in both. Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920	2011-07-18 11:48:05 -04:00
James Berry	6b6f367c3d	bug fix vpx_copy_and_extend_frame size issue vpx_copy_and_extend_frame could incorrectly resize uv frames which could result in a crash. Change-Id: Ie96f7078b1e328b3907a06eebeee44ca39a2e898	2011-07-14 15:58:15 -04:00
Johann	211694f67e	Merge "update x86 asm for loopfilter"	2011-07-13 04:10:03 -07:00
Johann	8f910594bd	Merge "Update armv6 loopfilter to new interface"	2011-07-13 04:09:55 -07:00
Johann	1a219c22b1	Merge "Update armv7 loopfilter to new interface"	2011-07-13 04:09:42 -07:00
Johann	d9b825cff2	Merge "New loop filter interface"	2011-07-13 04:09:26 -07:00
Attila Nagy	c231b0175d	Update armv6 loopfilter to new interface Change-Id: I5fe581d797571a7a9432fbd17fc557591d0c1afa	2011-07-12 12:14:51 +03:00
Attila Nagy	283b0e25ac	Update armv7 loopfilter to new interface Change-Id: I65105a9c63832669237e6a6a7fcb4ea3ea683346	2011-07-12 12:12:25 +03:00
Johann	01433c5043	update x86 asm for loopfilter Change-Id: I1ed739522db7c00c189851c7095c1b64ef6412ce	2011-07-08 09:23:38 -04:00
Johann	6ae12c415e	Merge "clean up warnings when building arm with rtcd"	2011-07-08 05:16:09 -07:00
Attila Nagy	622958449b	New loop filter interface Separate simple filter with reduced no. of parameters. MB filter level picking based on precalculated table. Level table updated for each frame. Inside and edge limits precalculated and updated just when sharpness changes. HEV threshhold is constant. ARM targets use scalars and others vectors. Change works only with --target=generic-gnu All other targets have to be updated! Change-Id: I6b73aca6b525075b20129a371699b2561bd4d51c	2011-07-08 09:31:41 +03:00
John Koleszar	b4f70084cc	Merge "Properly use GET_GOT/RESTORE_GOT when using GLOBAL()."	2011-07-01 07:14:34 -07:00
Ronald S. Bultje	c8a23ad3f4	Properly use GET_GOT/RESTORE_GOT when using GLOBAL(). This should fix binaries using PIC on x86-32. Also should fix issue 343. Change-Id: I591de3ad68c8a8bb16054bd8f987a75b4e2bad02	2011-06-30 14:04:27 -07:00
Johann	6611f66978	clean up warnings when building arm with rtcd Change-Id: I3683cb87e9cb7c36fc22c1d70f0799c7c46a21df	2011-06-29 10:51:41 -04:00
John Koleszar	f3a13cb236	Merge "Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently"	2011-06-29 07:29:59 -07:00
Johann	dc004e8c17	Merge "Avoid text relocations in ARM vp8 decoder"	2011-06-28 16:34:10 -07:00
Johann	02c30cdeef	Merge "utilize preload in ARMv6 MC/LPF/Copy routines"	2011-06-28 16:33:45 -07:00
John Koleszar	b32da7c3da	Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently There were many instances in the code of vp8_coef_tokens and vp8_coef_tokens-1, which was a preprocessor macro despite the naming convention. Replace these with MAX_ENTROPY_TOKENS and ENTROPY_NODES, respectively. Change-Id: I72c4f6c7634c94e1fa066cd511471e5592c748da	2011-06-28 17:03:55 -04:00
Stefan Holmer	7296b3f922	New ways of passing encoded data between encoder and decoder. With this commit frames can be received partition-by-partition from the encoder and passed partition-by-partition to the decoder. At the encoder-side this makes it easier to split encoded frames at partition boundaries, useful when packetizing frames. When VPX_CODEC_USE_OUTPUT_PARTITION is enabled, several VPX_CODEC_CX_FRAME_PKT packets will be returned from vpx_codec_get_cx_data(), containing one partition each. The partition_id (starting at 0) specifies the decoding order of the partitions. All partitions but the last has the VPX_FRAME_IS_FRAGMENT flag set. At the decoder this opens up the possibility of decoding partition N even though partition N-1 was lost (given that independent partitioning has been enabled in the encoder) if more info about the missing parts of the stream is available through external signaling. Each partition is passed to the decoder through the vpx_codec_decode() function, with the data pointer pointing to the start of the partition, and with data_sz equal to the size of the partition. Missing partitions can be signaled to the decoder by setting data != NULL and data_sz = 0. When all partitions have been given to the decoder "end of data" should be signaled by calling vpx_codec_decode() with data = NULL and data_sz = 0. The first partition is the first partition according to the VP8 bitstream + the uncompressed data chunk + DCT address offsets if multiple residual partitions are used. Change-Id: I5bc0682b9e4112e0db77904755c694c3c7ac6e74	2011-06-28 11:10:17 -04:00
Stefan Holmer	4cb0ebe5b2	Adding support for independent partitions Adding support in the encoder for generating independent residual partitions by forcing equal probabilities over the prev coef entropy contexts. Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21	2011-06-28 11:10:17 -04:00
Mike Hommey	e3f850ee05	Avoid text relocations in ARM vp8 decoder The current code stores pointers to coefficient tables and loads them to access the tables contents. As these pointers are stored in the code sections, it means we end up with text relocations. eu-findtextrel will thus complain about code not compiled with -fpic/-fPIC. Since the pointers are stored in the code sections, we can actually cheat and let the assembler generate relative addressing when accessing the coefficient tables, and just load their location with adr. Change-Id: Ib74ae2d3f2bab80b29991355f2dbe6955f38f6ae	2011-06-28 09:11:40 +02:00
Fritz Koenig	be99868bd1	Fix after removal of B_MODE_INFO Change Ieb746989: Removed B_MODE_INFO missed this. Change-Id: I32202555581cc2a5d45e729c6650ada4d2df55d3	2011-06-27 09:43:21 -07:00
Adrian Grange	deca8cfc44	Fixed initialization of frame buffer ref counters Only the first frame buffer ref counter was being initialized because the index was fixed at 0 rather than using i. Change-Id: Ib842298be4a5e3607f9e21c2cd4bfbee4054ffc4	2011-06-24 08:43:40 -07:00
James Berry	2bd90c13a0	get/set reference buffer dimension check added vp8_yv12_copy_frame_ptr() expects same size buffers which was not previously gaurenteed. Using an improperly allocated buffer would cause a crash before. Change-Id: I904982313ce9352474f80de842013dcd89f48685	2011-06-22 13:36:24 -04:00
Taekhyun Kim	458fb8f491	utilize preload in ARMv6 MC/LPF/Copy routines About 9~10% decoding perf improvement on non-Neon ARM cpus Change-Id: I7dc2a026764e84e9c2faf282b4ae113090326837	2011-06-17 14:04:53 -07:00
Scott LaVarnway	223d1b54cf	Populate bmi for B_PRED only Small decode performance gain (~1%) on keyframes. No noticeable gains on encode. Also changed pick_intra4x4mby_modes() to read the above and left block modes for keyframes only. Change-Id: I1f4885252f5b3e9caf04d4e01e643960f910aba5	2011-06-13 17:14:11 -04:00
Scott LaVarnway	e71a010646	Calc ref_frame_cost once per frame instead of every macro block. Change-Id: I2604e94c6b89e3a8457777e21c8c38406d55b165	2011-06-13 09:58:03 -04:00
Johann	79327be6c7	use GCC inline magic Better fix for #326. ICC happens to support the inline magic Change-Id: Ic367eea608c88d89475cb7b05d73500d2a1bc42b	2011-06-08 16:19:37 -04:00
Scott LaVarnway	773768ae27	Removed B_MODE_INFO Declared the bmi in BLOCKD as a union instead of B_MODE_INFO. Then removed B_MODE_INFO completely. Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67	2011-06-02 13:46:41 -04:00
Scott LaVarnway	4f586f7bd0	Broken EC after MODE_INFO size reduction This patch fixes the compiler errors and the seg fault when running decode_with_partial_drops. Change-Id: I7c75369e2fef81d53b790d5dabc327218216838b	2011-05-26 15:13:00 -04:00
John Koleszar	1fe5070b76	Merge "Do not copy data between encoder reference buffers."	2011-05-26 09:58:26 -07:00
Scott LaVarnway	a39321f37e	Use int_mv instead of MV in vp8_mv_cont Less operations. Change-Id: Ibb9cd5ae66b8c7c681c9a654d551c8729c31c3ae	2011-05-24 16:01:12 -04:00
Scott LaVarnway	e11f21af9a	MODE_INFO size reduction Declared the bmi in MODE_INFO as a union instead of B_MODE_INFO. This reduced the memory footprint by 518,400 bytes for 1080 resolutions. The decoder performance improved by ~4% for the clip used and the encoder showed very small improvements. (0.5%) This reduction was first mentioned to me by John K. and in a later discussion by Yaowu. This is WIP. Change-Id: I8e175fdbc46d28c35277302a04bee4540efc8d29	2011-05-24 13:24:52 -04:00
Johann	6d82d2d22e	Merge "Fixed iwalsh_neon build problems with RVDS4.1"	2011-05-20 07:51:11 -07:00
Scott LaVarnway	914f7c36d7	Merge "Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3."	2011-05-19 11:22:01 -07:00
John Koleszar	c684d5e5f2	Merge "changed configure option name to reduce confusion"	2011-05-19 11:17:08 -07:00
John Koleszar	7def902261	Fix segv without --enable-error-concealment Missed wrapping one function call in #if CONFIG_ERROR_CONCEALMENT. Change-Id: I5746b1e6e4531670dbed1130467331fe309bdcae	2011-05-19 13:57:45 -04:00
Stefan Holmer	d04f852368	Adding error-concealment to the decoder. The error-concealer is plugged in after any motion vectors have been decoded. It tries to estimate any missing motion vectors from the motion vectors of the previous frame. Intra blocks with missing residual are replaced with inter blocks with estimated motion vectors. This feature was developed in a separate sandbox (sandbox/holmer/error-concealment). Change-Id: I5c8917b031078d79dbafd90f6006680e84a23412	2011-05-19 13:46:33 -04:00
Attila Nagy	f96d56c4aa	Fixed iwalsh_neon build problems with RVDS4.1 rvct 4.1 was complaining about vstmia.16, store multiple expects 64 data type. optimized the implementation. Change-Id: I0701052cabd685c375637bbc3796ff6d88f5972c	2011-05-19 10:27:26 +03:00
Scott LaVarnway	6b25501bf1	Using int_mv instead of MV The compiler produces better assembly when using int_mv for assignments. The compiler shifts and ors the two 16bit values when assigning MV. Change-Id: I52ce4bc2bfbfaf3f1151204b2f21e1e0654f960f	2011-05-12 11:08:16 -04:00
Johann	df2023a6cb	set up Global Offset Table in recon global values were being referenced, but the GOT was not being set up. as the GOT is only required for PIC, this issue wasn't caught in the default configuration. Change-Id: I8006e53776139362a76f2c80cf9d0f8458602b2f http://code.google.com/p/webm/issues/detail?id=328	2011-05-10 15:58:56 -04:00
Johann	a7d4d3c550	clean up unused variable warnings Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b	2011-05-09 12:56:20 -04:00
Aron Rosenberg	eeb8117303	Fix semaphore emulation on Windows The existing emulation of posix semaphores on Windows uses SetEvent() and WaitForSingleObject(), which implements a binary semaphore, not a counting semaphore as implemented by posix. This causes deadlock when used with the expected posix semantics. Instead, this patch uses the CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows 2000) which have the expected behavior. This patch also reverts commit `eb16f00`, which split a semaphore that was being used with counting semantics into two binary semaphores. That commit is unnecessary with corrected emulation. Change-Id: If400771536a27af4b0c3a31aa4c4e9ced89ce6a0	2011-05-06 00:13:59 -04:00
Johann	ca5c1b17a2	Merge "Loopfilter NEON: Use VMOV for constant vectors instead of VLD."	2011-05-05 06:16:21 -07:00
Yunqing Wang	aeb86d615c	Merge "Runtime detection of available processor cores."	2011-05-05 04:59:54 -07:00
Attila Nagy	a6aa389d2f	Loopfilter NEON: Use VMOV for constant vectors instead of VLD. Change-Id: I562b6e01c32bb51d00f3b95faf757fc7dc29a3a3	2011-05-04 11:29:23 +03:00
John Koleszar	c09d8c1419	Merge "Fix documentation typos"	2011-05-02 06:50:22 -07:00
John Koleszar	a66d8d33dd	Fix compile error with --enable-postproc-visualizer Typo. Change-Id: I9cc6a4587c3d93c9f0da5e101d376741fc9622a4	2011-05-02 09:28:37 -04:00
Thijs Vermeir	8942f70cdf	Fix documentation typos Change-Id: I97124670926433bf1593c91660d8b8f8482ea9ce	2011-04-30 09:34:59 +02:00
Ronald S. Bultje	5a23352c03	Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3. Change-Id: I658a1df7d825f820573cb2d11ad402f9d2791035	2011-04-29 11:52:09 -07:00
Yaowu Xu	57ad189129	changed configure option name to reduce confusion Renamed configure option "enable-psnr" to "enable-internal-stats" to better reflect the purpose of the option and eliminate the confusion reported in http://code.google.com/p/webm/issues/detail?id=35 Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a	2011-04-29 09:39:05 -07:00
Scott LaVarnway	1b2abc5f49	Merge "Consolidated build inter predictors"	2011-04-29 07:13:49 -07:00
James Berry	f10732554b	bug fix removed inline from recon_wrapper_sse2.c removed inline from recon_wrapper_sse2.c to build for visual stuido Change-Id: I74a3482950448e2cdb30e9cd7087145b440d8a22	2011-04-28 15:12:00 -04:00
Scott LaVarnway	219ba87a93	Merge "Use psadbw to get the sum of bytes in a line."	2011-04-28 07:58:20 -07:00
Scott LaVarnway	ccd6f7ed77	Consolidated build inter predictors Code cleanup. Change-Id: Ic8b0167851116c64ddf08e8a3d302fb09ab61146	2011-04-28 10:53:59 -04:00
Ronald S. Bultje	1e7ded69cf	Use psadbw to get the sum of bytes in a line. Thanks Jason for pointing that out on #vp8. ;-). Change-Id: I5330a753e752a8704b78a409597472628e0b26a5	2011-04-27 13:49:21 -07:00
Scott LaVarnway	2e102855f4	Removed unused code in reconinter The skip flag is never set by the encoder for SPLITMV. Change-Id: I5ae6457edb3a1193cb5b05a6d61772c13b1dc506	2011-04-27 15:25:32 -04:00
Ronald S. Bultje	1083fe4999	SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}(). decoding before 10.425 10.432 10.423 =10.426 after: 10.405 10.416 10.398 =10.406, 0.2% faster encoding before 14.252 14.331 14.250 14.223 14.241 14.220 14.221 =14.248 after 14.095 14.090 14.085 14.095 14.064 14.081 14.089 =14.086, 1.1% faster Change-Id: I483d3d8f0deda8ad434cea76e16028380722aee2	2011-04-27 11:31:27 -07:00
Johann	d5c46bdfc0	Merge "remove simpler_lpf"	2011-04-25 14:51:07 -07:00
Johann	01527e743f	remove simpler_lpf the decision to run the regular or simple loopfilter is made outside the function and managed with pointers stop tracking the option in two places. use filter_type exclusively Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15	2011-04-25 17:37:41 -04:00
John Koleszar	cfbfd39de8	Merge "Change rc undershoot/overshoot semantics"	2011-04-25 10:49:32 -07:00
John Koleszar	aa926fbd27	Add rc_max_intra_bitrate_pct control Adds a control to limit the maximum size of a keyframe, as a function of the per-frame bitrate. See this thread[1] for more detailed discussion: [1]: http://groups.google.com/a/webmproject.org/group/codec-devel/browse_thread/thread/271b944a5e47ca38 Change-Id: I7337707642eb8041d1e593efc2edfdf66db02a94	2011-04-25 13:47:14 -04:00
Scott LaVarnway	3698c1f620	Removed dc_diff from MB_MODE_INFO The dc_diff flag is used to skip loopfiltering. Instead of setting this flag in the decoder/encoder, we now check for this condition in the loopfilter. Change-Id: Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931	2011-04-21 14:38:36 -04:00
Scott LaVarnway	7a49accd0b	Removed force_no_skip force_no_skip is always set to zero. Change-Id: I89b61c5e0bee34627a9c07c05f3517e1db76af77	2011-04-20 15:45:12 -04:00
Scott LaVarnway	09c933ea80	Removed redundant checks of the mode_info_context flags Code cleanup. The build inter predictor functions are redundantly checking the mode_info_context for either INTRA_FRAME or SPLITMV. Change-Id: I4d58c3a5192a4c2cec5c24ab1caf608bf13aebfb	2011-04-20 14:06:40 -04:00
Attila Nagy	43464e94ed	Do not copy data between encoder reference buffers. Golden and ALT reference buffers were refreshed by copying from the new buffer. Replaced this by index manipulation. Also moved all the reference frame updates to one function for easier tracking. Change-Id: Icd3e534e7e2c8c5567168d222e6a64a96aae24a1	2011-04-20 15:26:55 +03:00
Johann	4a2b684ef4	modify SAVE_XMM for potential 64bit use the win64 abi requires saving and restoring xmm6:xmm15. currently SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow specifying the highest register used and if the stack is unaligned. Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867	2011-04-19 10:42:45 -04:00
Johann	a9b465c5c9	Merge "Add save/restore xmm registers in x86 assembly code"	2011-04-19 06:32:10 -07:00
Johann	c7cfde42a9	Add save/restore xmm registers in x86 assembly code Went through the code and fixed it. Verified on Windows. Where possible, remove dependencies on xmm[67] Current code relies on pushing rbp to the stack to get 16 byte alignment. This broke when rbp wasn't pushed (vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned memory accesses. Revisit this and the offsets in vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM. Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877	2011-04-18 16:30:38 -04:00
Yunqing Wang	d5069b5af0	Merge "Handle long delay between video frames in multi-thread decoder(issue 312)"	2011-04-18 10:11:41 -07:00
Yunqing Wang	8ba58951e9	Handle long delay between video frames in multi-thread decoder(issue 312) This is reported by m...@hesotech.de (see issue 312): "The decoder causes an access violation when you decode the first frame, then make a pause of about 60 seconds and then decode further frames. But only if vpx_codec_dec_cfg_t.threads> 1. This is caused by a timeout of WaitForSingleObject. When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF), the problem is solved." Reproduced the crash and verified the changes on Windows platform. This brings the behavior inline with the other platforms using sem_wait(). Change-Id: I27b32f90bce05846ef2684b50f7a88f292299da1	2011-04-15 17:27:26 -04:00
Johann	487c0299c9	remove dead code, add missing RESTORE_XMM vp8_filter_block1d16_h4_ssse3 was never called because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was masked. however, if/when win64 used those registers for persistant data, issues could/will arise. Change-Id: I56d6effca0aeba1f86082689771cb10145d39651	2011-04-15 10:11:53 -04:00
John Koleszar	a3399291ad	Fix off-by-one in copy_and_extend_plane Should only copy h lines, not h+1. Change-Id: I802a85686635900459c6dc79596189033e5298d8	2011-04-15 08:44:39 -04:00
John Koleszar	88841f1059	Refactor lookahead ring buffer This patch cleans up the source buffer storage and copy mechanism to allow access through a standard push/pop/peek interface. This approach also avoids an extra copy in the case where the source is not a multiple of 16, fixing issue #102. Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9	2011-04-13 14:26:45 -04:00
John Koleszar	c99f9d7abf	Change rc undershoot/overshoot semantics This patch changes the rc_undershoot_pct and rc_overshoot_pct controls to set the "aggressiveness" of rate adaptation, by limiting the amount of difference between the target buffer level and the actual buffer level which is applied to the target frame rate for this frame. This patch was initially provided by arosenberg at logitech.com as an attachment to issue #270. It was modified to separate these controls from the other unrelated modifications in that patch, as well as to use the pre-existing variables rather than introducing new ones. Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62	2011-04-12 20:49:33 -04:00
John Koleszar	a9ce3e3834	Remove unused files Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08	2011-04-11 10:34:40 -04:00
Yunqing Wang	3d6815817c	Use full-pixel MV in mvsadcost calculation MV sad cost error is only used in full-pixel motion search, which only need full-pixel resolution instead of quarter-pixel resolution. This change reduced mvsadcost table size, and removed unneccessary pamameter passing since this table is constant once it is generated. Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0	2011-04-01 16:41:58 -04:00
Attila Nagy	297b27655e	Runtime detection of available processor cores. Detect the number of available cores and limit the thread allocation accordingly. On decoder side limit the number of threads to the max number of token partition. Core detetction works on Windows and Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN. Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078	2011-03-31 10:23:01 +03:00
John Koleszar	769c74c0ac	Merge "Increase static linkage, remove unused functions"	2011-03-21 04:51:51 -07:00
John Koleszar	429dc676b1	Increase static linkage, remove unused functions A large number of functions were defined with external linkage, even though they were only used from within one file. This patch changes their linkage to static and removes the vp8_ prefix from their names, which should make it more obvious to the reader that the function is contained within the current translation unit. Functions that were not referenced were removed. These symbols were identified by: $ nm -A libvpx.a \| sort -k3 \| uniq -c -f2 \| grep ' [A-Z] ' \ \| sort \| grep '^ *1 ' Change-Id: I59609f58ab65312012c047036ae1e0634f795779	2011-03-17 20:53:47 -04:00
John Koleszar	de50520a8c	apple: include proper mach primatives Fixes implicit declaration warning for 'mach_task_self'. This change is an update to Change I9991dedd1ccfddc092eca86705ecbc3b764b799d, which fixed this issue for the decoder but not the encoder. Change-Id: I9df033e81f9520c4f975b7a7cf6c643d12e87c96	2011-03-16 13:59:32 -04:00
John Koleszar	27972d2c1d	Move build_intra_predictors_mby to RTCD framework The vp8_build_intra_predictors_mby and vp8_build_intra_predictors_mby_s functions had global function pointers rather than using the RTCD framework. This can show up as a potential data race with tools such as helgrind. See https://bugzilla.mozilla.org/show_bug.cgi?id=640935 for an example. Change-Id: I29c407f828ac2bddfc039f852f138de5de888534	2011-03-11 13:04:50 -05:00
Scott LaVarnway	861175ef00	Removed vp8_block2type and used defines instead. Change-Id: Idb56e0295d004793f406dfd2d8d8c546aad62e03	2011-02-24 14:35:18 -05:00
John Koleszar	c764c2a20f	Merge "clean up unused files"	2011-02-18 06:33:05 -08:00
John Koleszar	3ed8fe8778	remove unused vp8_predict_dc function Change-Id: I64fa47889c54cfed094a674c49ef0996d49bdd42	2011-02-18 09:12:20 -05:00
John Koleszar	cbf923b12c	clean up unused files Removed a number of files that were unused or little-used. Change-Id: If9ae5e5b11390077581a9a879e8a0defe709f5da	2011-02-18 09:09:49 -05:00
John Koleszar	ac10665ad8	Merge "Removed unused vp8_recon_intra4x4mb function"	2011-02-17 11:30:13 -08:00
Scott LaVarnway	07f7b66fae	Removed unused vp8_recon_intra4x4mb function Change-Id: I4a328ce152d9dbe6b0d1606d1b523e8e7bfb468e	2011-02-17 13:34:38 -05:00
John Koleszar	02321de0f2	Fix relative include paths Allow compiling without adding vp8/{common,encoder,decoder} to the include paths. Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c	2011-02-10 15:09:44 -05:00
Johann	7d8199f0c3	Merge "Adds armv6 optimized variance calculation"	2011-02-10 06:06:46 -08:00
John Koleszar	a39b5af10b	Merge "Put more code under #if CONFIG_MULTITHREAD."	2011-02-09 08:31:36 -08:00
Gaute Strokkenes	315e3c2518	Put more code under #if CONFIG_MULTITHREAD. Change-Id: Icf4b692099d7d249fe3553852b1022b027b28e4b	2011-02-09 11:21:18 -05:00
Tero Rintaluoma	cb14764fab	Adds armv6 optimized variance calculation Adds vp8_sub_pixel_variance16x16_armv6 function to encoder. Integrates ARMv6 optimized bilinear interpolations from vp8/common/arm/armv6 and adds new assembly file for variance16x16 calculation. - vp8_filter_block2d_bil_first_pass_armv6 (integrated) - vp8_filter_block2d_bil_second_pass_armv6 (integrated) - vp8_variance16x16_armv6 (new) - bilinearfilter_arm.h (new) Change-Id: I18a8331ce7d031ceedd6cd415ecacb0c8f3392db	2011-02-09 10:23:43 -05:00
Johann	e5aaac24bb	clean up bilinear filter make reference version of bilinear_filters short. use reference versions of bilinear_filters and sub_pel_filters when possible. recognize that Width was being passed into filter_block2d_bil_first_pass multiple times. ARM version had already fixed this. propegate to C. change references to src_pixels_per_line to src_pitch and standardize on src/dst (instead of input/output). recognize that first_pass is only run in the verticle and second_pass only horizontal. ARM version had already fixed this. propegate to C Change-Id: I292d376d239a9a7ca37ec2bf03cc0720606983e2	2011-02-08 17:42:54 -05:00
Johann	40dcae9c2e	clarify _offsets.asm differences it's difficult to mux the _offsets.c files because of header conflicts. make three instead, name them consistently and partititon the contents to allow building them as required. Change-Id: I8f9768c09279f934f44b6c5b0ec363f7943bb796	2011-02-08 16:35:43 -05:00
Johann	3273c7b679	move one of the offset files common/arm/vpx_asm_offsets moves up a level. prepare for muxing with encoder/arm/vpx_vp8_enc_asm_offsets Change-Id: I89a04a5235447e66571995c9d9b4b6edcb038e24	2011-02-07 11:35:30 -05:00
Johann	bb9c95ea53	remove unused dboolhuff code we were holding on to this "just in case." purge it instead Change-Id: I77a367b36d0821d731019f2566ecfffdae1d4b8a	2011-02-04 16:00:00 -05:00
Yunqing Wang	350ffe8dae	Merge "Improve MV prediction in vp8_pick_inter_mode() for speed>3"	2011-02-04 10:10:15 -08:00
Gaute Strokkenes	ffc6aeef14	Remove duplicate loopfilter parameters. Change-Id: I0d41415e3961c2c9492d342290c1999f9d02e6d8	2011-02-04 14:55:02 +00:00
Gaute Strokkenes	bf5f585b0d	Make vp8_adjust_mb_lf_value return the updated value rather than manipulating it in situ via a pointer. Change-Id: If4a87a4eccd84f39577c0e91e171245f4954c5cf	2011-02-03 19:24:16 +00:00
Johann	f3cb9ae459	Merge "Adds "armvX-none-rvct" targets"	2011-01-28 09:03:58 -08:00
Yunqing Wang	7cbe684ef5	Improve MV prediction in vp8_pick_inter_mode() for speed>3 Applied same method used in vp8_rd_pick_inter_mode() to improve the accuracy of MV prediction. Change-Id: Ia50ae26208b18482695601f32febd99fe89fbc17	2011-01-28 10:00:20 -05:00
Tero Rintaluoma	11a222f5d9	Adds "armvX-none-rvct" targets Adds following targets to configure script to support RVCT compilation without operating system support (for Profiler or bare metal images). - armv5te-none-rvct - armv6-none-rvct - armv7-none-rvct To strip OS specific parts from the code "os_support"-config was added to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS specific parts such as OS specific includes and function calls for timers and threads etc. This was done to enable RVCT compilation for profiling purposes or running the image on bare metal target with Lauterbach. Removed separate AREA directives for READONLY data in armv6 and neon assembly files to fix the RVCT compilation. Otherwise "ldr <reg>, =label" syntax would have been needed to prevent linker errors. This syntax is not supported by older gnu assemblers. Change-Id: I14f4c68529e8c27397502fbc3010a54e505ddb43	2011-01-28 12:47:39 +02:00
Yunqing Wang	cac54404b9	Remove copies of same functions Reduce the code size. Change-Id: I2e1998557a3c8776e262c442fd758c25e17aff7a	2011-01-26 15:37:00 -05:00
John Koleszar	2f0331c90c	Merge "Implement error tracking in the decoder"	2011-01-19 05:51:00 -08:00
Henrik Lundin	67fb3a5155	Implement error tracking in the decoder A new vpx_codec_control called VP8D_GET_FRAME_CORRUPTED. The output from the function is non-zero if the last decoded frame contains corruption due to packet losses. The decoder is also modified to accept encoded frames of zero length. A zero length frame indicates to the decoder that one or more frames have been completely lost. This will mark the last decoded reference buffer as corrupted. The data pointer can be NULL if the length is zero. Change-Id: Ic5902c785a281c6e05329deea958554b7a6c75ce	2011-01-19 09:53:21 +01:00
Henrik Lundin	48c28fc42c	Remove unused local variables Removing unused local variables causing compiler warnings in Visual Studio. Change-Id: I0e2096303be1fdbc01428a6e57cca9796bb32c8a	2011-01-11 15:22:19 +01:00
Paul Wilkins	e0846c9c8c	CQ Mode The merge includes hooks to for CQ mode and other code changes merged from the test branch. CQ mode attempts to maintain a more stable quantizer within a clip whilst also trying to adhere to a guidline maximum bitrate. The existing target data rate parameter is used to specify the guideline maximum bitrate. A new parameter allows the user to specify a target CQ level. For normal (non kf/gf/arf) frames, the quantizer will not drop BELOW the user specified value (0-63). However, in some cases the encoder may choose to impose a target CQ that is above that specified by the user, if it estimates that consistent use of the target value is not compatible with guideline maximum bitrate. Change-Id: I2221f9eecae8cc3c431d36caf83503941b25e4c1	2011-01-07 18:46:29 +00:00
Yunqing Wang	a864678cdb	Always update last_frame_type Scott pointed out that last_frame_type only gets updated while loopfilter exists. Since last_frame_type is also needed in motion search now, it needs to be updated every frame. Change-Id: I9203532fd67361588d4024628d9ddb8e391ad912	2010-12-29 10:28:35 -05:00
John Koleszar	b0da9b399d	Add psnr/ssim tuning option Add a new encoder control, VP8E_SET_TUNING, to allow the application to inform the encoder that the material will benefit from certain tuning. Expose this control as the --tune option to vpxenc. The args helper is expanded to support enumerated arguments by name or value. Two tunings are provided by this patch, PSNR (default) and SSIM. Activity masking is made dependent on setting --tune=ssim, as the current implementation hurts speed (10%) and PSNR (2.7% avg, 10% peak) too much for it to be a default yet. Change-Id: I110d969381c4805347ff5a0ffaf1a14ca1965257	2010-12-17 10:01:05 -05:00
Johann	825adc464f	shrink TOKENEXTRA and vp8_extra_bit_struct Per John's previous change, shrink TOKENEXTRA from 20 to 8 bytes original: `b7b1e6fb` reverted: `41f4458a` Also drop unused field from vp8_extra_bit_struct Update ARM ASM to deal with this change. In particular, Extra is signed and needs to be sign-extended when loaded. Change-Id: Ibd0ddc058432bc7bb09222d6ce4ef77e93a30b41	2010-12-14 10:32:50 -05:00
John Koleszar	b1aa54ab26	remove unused temporal preproc code This code is unused, as the current preproc implementation uses the same spatial filter that postproc uses. Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7	2010-12-13 16:47:59 -05:00
Fritz Koenig	e0cf330cde	vp8 fast quantizer sse2 optimizations for eob. Changed the end of block computation to use pmaxw. Removed additional pushing and popping of registers that was not needed. Change-Id: I08cb9b424513cd8a2c7ad8cea53b4e2adc66ef98	2010-12-09 15:00:30 -08:00
Yaowu Xu	d49da085c0	correct errors in token alphabet descriptions There were a few errors in the comment section that describe VP8 token alphabet table. Change-Id: Ie6728a0e08bc3798893221b60408d5b201064bdc	2010-11-16 10:51:43 -08:00
Fritz Koenig	647df00f30	postproc : Re-work posproc calling to allow more flags. Debugging in postproc needs more flags to allow for specific block types to be turned on or off in the visualizations. Must be enabled with --enable-postproc-visualizer during configuration time. Change-Id: Ia74f357ddc3ad4fb8082afd3a64f62384e4fcb2d	2010-11-10 14:14:46 -08:00
Fritz Koenig	0e7b60617f	postproc : Update visualizations. Change color reference frame to blend the macro block edge. This helps with layering of visualizations. Add block coloring for intra prediction modes. Change-Id: Icefe0e189e26719cd6937cebd6727efac0b4d278	2010-11-04 10:35:02 -07:00
Fritz Koenig	0a29bd9793	postproc : Fix display of motion vectors. Split motion vectors were all being treated as 4x4 blocks. Now correctly handle 16x8, 8x16, 8x8, 4x4 blocks. Change-Id: Icf345c5e69b5e374e12456877ed7c41213ad88cc	2010-11-02 13:29:13 -07:00
Fritz Koenig	9f61a83bf9	postproc : Added SPLITMV visualization, fix line constrain. Now draw 16 vectors for SPLITMV mode. Fixed constrain line to block divide by zero issues. Blend block was not centering the shaded area correctly. Change-Id: I1edabd8b4e553aac8d980f7b45c80159e9202434	2010-11-01 13:27:13 -07:00
Timothy B. Terriberry	c4d7e5e67e	Eliminate more warnings. This eliminates a large set of warnings exposed by the Mozilla build system (Use of C++ comments in ISO C90 source, commas at the end of enum lists, a couple incomplete initializers, and signed/unsigned comparisons). It also eliminates many (but not all) of the warnings expose by newer GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite without checking the return values). There are a few spurious warnings left on my system: ../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used uninitialized in this function gcc seems to be unable to figure out that the value shortcut doesn't change between the two if blocks that test it here. ../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned expression >= 0 is always true ../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned expression >= 0 is always true This is true, so far as it goes, but it's comparing against an enum, and the C standard does not mandate that enums be unsigned, so the checks can't be removed. Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395	2010-10-27 18:08:04 -07:00
Fritz Koenig	a097e18964	postproc: Tweaks to line drawing and blending. Turned down the blending level to make colored blocks obscure the video less. Not blending the entire block to give distinction to macro block edges. Added configuration so that macro block blending function can be optimized. Change to constrain line as to when dx and dy are computed. Now draw two lines to form an arrow. Change-Id: Id3ef0fdeeab2949a6664b2c63e2a3e1a89503f6c	2010-10-27 13:20:03 -07:00
Johann	787733d855	Merge "RTCD build is bringing old errors to light"	2010-10-27 09:59:01 -07:00
Fritz Koenig	cf127474d8	vpxdec : Change --pp-debug-info to be a bit field. This allows multiple post processor debug levels to be overlayed. i.e. can show colored reference blocks and visual motion vectors. Change-Id: Ic4a1df438445b9f5780fe73adb3126e803472e53	2010-10-27 09:53:37 -07:00
Fritz Koenig	36ff6a6743	Merge "postproc: Add mode and refrence frame visualizers."	2010-10-27 09:04:39 -07:00
Johann	abcf36c758	RTCD build is bringing old errors to light needs to be _recon_ not _recon_recon_ Change-Id: I7a8b9ddcb4fb72c2b723c563932c9ea52ff15982	2010-10-27 10:47:48 -04:00
Fritz Koenig	a0ccc97d8a	postproc: Add mode and refrence frame visualizers. Post process option to color the block for either the mode of the macro block, or the frame that the macro block references. Change-Id: Ie498175497f2d20e3319924d352dc4ddc16f4134	2010-10-26 16:00:14 -07:00
John Koleszar	d6c67f02c9	make vp8_recon16x16mb{,y} RTCD functions ARM NEON has a platform specific version of vp8_recon16x16mb, though it's just a stub to extract the various parameters from the MACROBLOCKD struct and pass them to vp8_recon16x16mb_neon(). Using that function's prototype directly will be a better long term solution, but it's quite an invasive change. Change-Id: I04273149e2ade34749e2d09e7edb0c396e1dd620	2010-10-26 13:23:36 -04:00
John Koleszar	19638c2309	arm: move unrolled loops back to generic code Some of the ARM functions differed from their generic counterparts only by unrolling their loops. Since this change may be useful on other platforms, or might even supercede the looped version in the generic case, move it back to the generic file. This code is left under #if ARCH_ARM for now, but it may be worth considering a different (possibly new) conditional for these. If it turns out that this should be runtime selectable, these functions will have to move to the RTCD infrastructure. Don't want to take that step at this time without more profile data. Change-Id: I4612fdbc606fbebba4971a690fb743ad184ff15f	2010-10-26 09:51:35 -04:00
John Koleszar	d330a5876b	arm: remove duplicate functions These functions were true duplicates of functions present in the generic code. This fixes some of the link errors when building with --enable-shared --enable-pic. Change-Id: Idff26599d510d954e439207883607ad6b74df20c	2010-10-26 09:37:44 -04:00
Fritz Koenig	1d70aaf08b	Merge "Debug option for drawing motion vectors."	2010-10-25 15:40:22 -07:00
Fritz Koenig	d1a4cce809	Debug option for drawing motion vectors. Postproc level that uses Bresenham's line algorithm to draw motion vectors onto the postproc buffer. Change-Id: I34c7daa324f2bdfee71e84fcb1c50b90fa06f6fb	2010-10-25 15:39:04 -07:00
Johann	1376f061da	reuse common loopfilter code there were four versions for the regular and macroblock loopfilters: horizontal [y\|uv] vertical [y\|uv] this moves all the common code into 2 functions: vp8_loop_filter_neon vp8_mbloop_filter_neon this provides no gain in performance. there's a bit of jitter, but it trends down ~0.25-0.5%. however, this is a huge gain maintenance. also, there is the potential to drop some stack usage in the macroblock loopfilter. Change-Id: I91506f07d2f449631ff67ad6f1b3f3be63b81a92	2010-10-25 09:48:50 -04:00
Timothy B. Terriberry	b71962fdc9	Add runtime CPU detection support for ARM. The primary goal is to allow a binary to be built which supports NEON, but can fall back to non-NEON routines, since some Android devices do not have NEON, even if they are otherwise ARMv7 (e.g., Tegra). The configure-generated flags HAVE_ARMV7, etc., are used to decide which versions of each function to build, and when CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen at run time. In order for this to work, the CFLAGS must be set to something appropriate (e.g., without -mfpu=neon for ARMv7, and with appropriate -march and -mcpu for even earlier configurations), or the native C code will not be able to run. The ASFLAGS must remain set for the most advanced instruction set required at build time, since the ARM assembler will refuse to emit them otherwise. I have not attempted to make any changes to configure to do this automatically. Doing so will probably require the addition of new configure options. Many of the hooks for RTCD on ARM were already there, but a lot of the code had bit-rotted, and a good deal of the ARM-specific code is not integrated into the RTCD structs at all. I did not try to resolve the latter, merely to add the minimal amount of protection around them to allow RTCD to work. Those functions that were called based on an ifdef at the calling site were expanded to check the RTCD flags at that site, but they should be added to an RTCD struct somewhere in the future. The functions invoked with global function pointers still are, but these should be moved into an RTCD struct for thread safety (I believe every platform currently supported has atomic pointer stores, but this is not guaranteed). The encoder's boolhuff functions did not even have _c and armv7 suffixes, and the correct version was resolved at link time. The token packing functions did have appropriate suffixes, but the version was selected with a define, with no associated RTCD struct. However, for both of these, the only armv7 instruction they actually used was rbit, and this was completely superfluous, so I reworked them to avoid it. The only non-ARMv4 instruction remaining in them is clz, which is ARMv5 (not even ARMv5TE is required). Considering that there are no ARM-specific configs which are not at least ARMv5TE, I did not try to detect these at runtime, and simply enable them for ARMv5 and above. Finally, the NEON register saving code was completely non-reentrant, since it saved the registers to a global, static variable. I moved the storage for this onto the stack. A single binary built with this code was tested on an ARM11 (ARMv6) and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder, and produced identical output, while using the correct accelerated functions on each. I did not test on any earlier processors. Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa	2010-10-25 09:23:29 -04:00
Timothy B. Terriberry	8f75ea6b5c	Convert [4][4] matrices to [16] arrays. Most of the code that actually uses these matrices indexes them as if they were a single contiguous array, and coverity produces reports about the resulting accesses that overflow the static bounds of the first row. This is perfectly legal in C, but converting them to actual [16] arrays should eliminate the report, and removes a good deal of extraneous indexing and address operators from the code. Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23	2010-10-21 17:04:30 -07:00
Yaowu Xu	fc2f8dafaf	Merge "fixed a typo that mis-used Y plane stride for UV blocks."	2010-10-19 16:23:31 -07:00
Yunqing Wang	7804befb55	Fix one gcc compiler warning ../libvpx/vp8/encoder/bitstream.c: In function ‘pack_inter_mode_mvs’: ../libvpx/vp8/encoder/bitstream.c:1026: warning: array subscript has type ‘char’ Change-Id: Ic77491e0a172fa1821e5b3e914d0dc41fe87c00f	2010-10-14 15:15:35 -04:00
Jan Kratochvil	1fc294116a	nasm: movhps compatibility QWORD->MMWORD Filed for nasm as: https://sourceforge.net/tracker/?func=detail&atid=106208&aid=3081103&group_id=6208 nasm just does not accept any size parameter for movhps: 1.asm:2: error: mismatch in operand sizes Some parts of libvpx already use MMWORD for movhps and MMWORD is defined-out so it is compatible both with yasm and nasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Change-Id: I4008a317ca87ec07c9ada958fcdc10a0cb589bbc	2010-10-04 20:47:19 -04:00
Jan Kratochvil	5cdc3a4c29	nasm: address labels 'rel label' vice 'wrt rip' nasm does not support `label wrt rip', it requires `rel label'. It is still fully compatible with yasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50	2010-10-04 19:47:54 -04:00
Jan Kratochvil	e114f699f6	nasm: match instruction length (movd/movq) to parameters nasm requires the instruction length (movd/movq) to match to its parameters. I find it more clear to really use 64bit instructions when we use 64bit registers in the assembly. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91	2010-10-04 23:36:29 +02:00
Yaowu Xu	49fdb7c41e	fixed a typo that mis-used Y plane stride for UV blocks. Raised by Lei Yang, the Y plane stride was used for UV blocks. This is clearly a typo. But as the comments in the code suggested that this port of code has not been used yet, so the typo should not have created any damage yet. Change-Id: Iea895edc17469a51c803a8cc6d0fce65a1a7fc2f	2010-10-04 11:31:14 -07:00
Johann	f143a81191	Merge "Fix valgrind errors in the NEON loop filters."	2010-10-01 06:18:53 -07:00
Timothy B. Terriberry	a465076e02	Fix valgrind errors in the NEON loop filters. Like the ARMv6 code, these functions were accessing values below the stack pointer, which can be corrupted by signal delivery at any time.	2010-09-30 20:40:45 -07:00
John Koleszar	a047fee606	Merge "Fix loopfilter delta zero transitions"	2010-09-30 10:26:10 -07:00
Fritz Koenig	439b2ecd74	Merge "Optimizations on the loopfilters."	2010-09-29 10:47:01 -07:00
John Koleszar	b9be7a464f	Fix loopfilter delta zero transitions Loopfilter deltas are initialized to zero on keyframes in the decoder. The values then persist from the previous frame unless an update bit is set in the bitstream. This data is not included in the entropy data saved by the 'refresh entropy' bit in the bitstream, so it is effectively an additional contextual element beyond the 3 ref-frames and the entropy data. The encoder was treating this delta update bit as update-if-nonzero, meaning that the value would be refreshed even if it hadn't changed, and more significantly, if the correct value for the delta changed to zero, the update wouldn't be sent, and the decoder would preserve the last (presumably non-zero) value. This patch updates the encoder to send an update only if the value has changed from the previously transmitted value. It also forces the value to be transmitted in error resilient mode, to account for lost context in the event of lost frames. Change-Id: I56671d5b42965d0166ac226765dbfce3e5301868	2010-09-29 13:04:04 -04:00
Fritz Koenig	0964ef0e71	Optimizations on the loopfilters. - Scheduling for Atom processors - Combining of macros to allow for better interleaving - Change from multiplies to adds for main filter - Use of movhps/movlps to fill xmm registers without shifting and orring Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b	2010-09-28 12:01:34 -07:00
Timothy B. Terriberry	18dc92fd66	Add 4-tap version of 2nd-pass ARMv6 MC filter. The existing code applied a 6-tap filter with 0's on either end. We're already paying the branch penalty to avoid computing the two extra columns needed as input to this filter. We might as well save time computing the filter as well. This reduces the inner loop from 21 instructions to 16, the number of loads per iteration from 4 to 1, and the number of multiplies from 7 to 4. The gain in overall decoding performance, however, is small (less than 1%). This change also means we now valgrind clean on ARMv6, which is its real purpose. The errors reported here were valgrind's fault (it does not detect that 0 times an uninitialized value is initialized), but Julian Seward says it would slow down valgrind considerably to make such checks. Speeding up libvpx rather, even by a small amount, seems a much better idea if only to enable proper valgrind checking of the rest of the codec. Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16	2010-09-27 18:25:45 -07:00
John Koleszar	2b521ab551	move reconintra_mt to decoder (fixup) Missed the .h file in the move. Change-Id: Ib408183fbb4d019fd46394b362f89ca6ea9d10bc	2010-09-27 12:48:31 -04:00
Johann	063be9b82a	Merge "combine max values and compare once"	2010-09-27 06:39:20 -07:00
Timothy B. Terriberry	e2795e9978	Fix valgrind errors in vp8_sixtap_predict8x4_armv6(). This function was accessing values below the stack pointer, which can be corrupted by signal delivery at any time. Change-Id: I92945b30817562eb0340f289e74c108da72aeaca	2010-09-24 14:34:18 -07:00
Johann	f30e8dd7bd	combine max values and compare once previous implementation compared each set of values to limit and then &'d them together, requiring a compare and & for each value. this does the accumulation first, requiring only one compare Change-Id: Ia5e3a1a50e47699c88470b8c41964f92a0dc1323	2010-09-24 15:42:50 -04:00
John Koleszar	48e76ff4fd	move reconintra_mt to decoder (for now) reconintra_mt.c is only required for building the decoder right now. It could definitely be used for the encoder in the future, but it currently depends on decoder only data structures. (onyxd_int.h, VP8D_COMP, etc). Move it from common/ to decoder/ until the necessary changes to the common multithread code are complete. This patch is needed to build with --disable-vp8-decoder. Change-Id: I568c52221a2b309234d269675cba97131ce35c86	2010-09-24 11:23:06 -04:00
Johann	7fed3832e7	Remove dead code The new loopfilter was originally introduced as an experimental change. It's permanent now. Change-Id: I25dbedb6ceff3e9f9c04e18bb29f84c3ecb7e546	2010-09-22 11:07:34 -04:00
Yunqing Wang	a23ccf8f8c	Merge "Restructure multi-threaded decoder"	2010-09-21 05:00:30 -07:00
Fritz Koenig	b7dc9398f2	Use movq instead of movdqu. Movdqu is more expensive (throughput, uops) than movq. Minimal impact for newer big cores, but ~2.25% gain on Atom. Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f	2010-09-20 11:34:26 -07:00
Fritz Koenig	8eae7fe7e8	Better choice of instruction filter mask comparision. Use pmaxub instead of a combination of psubusb/por to determine if any comparisons go over the limit. Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82	2010-09-20 10:20:38 -07:00
Yunqing Wang	f857a85088	Restructure multi-threaded decoder On each MB, loopfiltering is done right after MB decoding. This combines two loops in multi-threaded code into one, which reduces number of synchronizations to half. The above-row/left-col data are saved in temp buffers for next-row/next MB decoding. Tests on 4-core gLucid machine showed 10% decoder performance gain with threads=4 (tulip clip). Testing on other platforms isn't done yet. Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9	2010-09-17 09:56:05 -04:00
Fritz Koenig	769f2424cc	Removed unnecessary pxor. There is no need to make sure that the lower byte of the register is 0 because the downshift by 11 overwrites that byte. Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1	2010-09-13 18:34:34 -07:00
Fritz Koenig	a65cd3def0	Make block access to frame buffer sequential Sequentially accessing memory from a low address to a high address should make it easier for the processor to predict the cache. Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d	2010-09-10 16:27:28 -07:00
Fritz Koenig	6d90f867e4	Merge branch 'master' of git://review.webmproject.org/libvpx	2010-09-09 08:54:21 -07:00
John Koleszar	c2140b8af1	Use WebM in copyright notice for consistency Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba	2010-09-09 10:01:21 -04:00
Fritz Koenig	3fb37162a8	Bilinear subpixel optimizations for ssse3. Used pmaddubsw for multiply and add of two filter taps at once for 16x16 and 8x8 blocks. Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea	2010-09-07 17:19:40 -07:00
Scott LaVarnway	0de458f6b9	Reduced the size of MB_MODE_INFO Moved partition_bmi and partition_count out of MB_MODE_INFO and placed into MACROBLOCK. Also reduced the size of other members of the MB_MODE_INFO struct. For 1080p, the memory was reduced by 1,209,516 bytes. The decoder performance appeared to improve by 3% for the clip used. Note: The main goal for this change is to improve the decoder performance. The encoder will be revisited at a later date for further structure cleanup. Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613	2010-09-03 16:43:23 -04:00
James Zern	76640f85da	encoder: remove postproc dependency Remove the dependency on postproc.c for the encoder in general, the only unchecked need for it is when CONFIG_PSNR is enabled. All other cases are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file will still be included. Additionally, when VP8_SET_POSTPROC is used with the encoder when post processing has been disabled an error will be returned. This addresses issue #153. Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089	2010-09-02 11:52:37 -04:00
Yunqing Wang	0e78efad0b	Replace sleep(0) calls in multi-threaded decoder This is a workaround for gLucid problem. Change-Id: I188a016a07e4c2ea212444c5a6284ff3c48a5caa	2010-08-31 20:37:11 -04:00
Johann	0b94f5d6e8	followup arm patch make the arm asm detokenizer work with the new structures Change-Id: I7cd92c2a018ec24032bb1cfd1bb9739bc84b444a	2010-08-31 11:41:10 -04:00
Scott LaVarnway	e85e631504	Changed above and left context data layout The main reason for the change was to reduce cycles in the token decoder. (~1.5% gain for 32 bit) This layout should be more cache friendly. As a result of this change, the encoder had to be updated. Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837 Note: dixie uses a similar layout	2010-08-31 11:24:30 -04:00
Timothy B. Terriberry	7a8e0a2935	Fix harmless off-by-1 error. The memory being zeroed in vp8_update_mode_info_border() was just allocated with calloc, and so the entire function is actually redundant, but it should be made correct in case someone expects it to actually work in the future. Change-Id: If7a84e489157ab34ab77ec6e2fe034fb71cf8c79	2010-08-27 16:07:54 -07:00
Fritz Koenig	93c32a55c2	Rework idct calling structure. Moving the eob structure allows for a non-struct based function to handle decoding an entire mb of idct/dequant/recon data. This allows for SIMD functions to idct/dequant/recon multiple blocks at once. SSE2 implementation gives 3% gain on Atom. Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2	2010-08-23 08:58:54 -07:00
Jim Bankoski	b0660457fe	Revert "Removed ssse3 sixtap code" This reverts commit `6ea5bb85cd`.	2010-08-19 15:58:27 -04:00
Johann	52852da7c9	cleanup simple loop filter move some things around, reorder some instructions constant 0 is used several times. load it once per call in horiz, once per loop in vert. separate saturating instructions to avoid stalls. just use one usub8 call to set GE flags, rather than uqsub8 followed by usub8 w/ 0 document some stalls for further consideration Change-Id: Ic3877e0ddbe314bb8a17fd5db73501a7d64570ec	2010-08-19 13:37:40 -04:00
Johann	a522be2941	Merge "fix armv6 simpleloop filter"	2010-08-19 08:31:57 -07:00
Johann	467a0b99ab	fix armv6 simpleloop filter test cases were causing a crash because the count was being read incorrectly. after fixing that, noticed that the output was not matching. fixed that. Change-Id: Idb0edb887736bd566a3cf6d4aa1a03ea8d20eb27	2010-08-19 11:29:21 -04:00
Scott LaVarnway	6ea5bb85cd	Removed ssse3 sixtap code Change-Id: I0f20fbb898ee31eb94a143471aa6f1ca17a229a4	2010-08-18 15:34:09 -04:00
Johann	c75f3993c0	store more vars than we removed only saved r4-11+lr, but were storing r4-r12+lr Change-Id: If77df1998af50e9badee7d99ef53543046434675	2010-08-16 10:32:15 -04:00
John Koleszar	80d3923a78	move segmentation_common to encoder vp8_update_gf_useage_maps() is only used by the encoder. This patch fixes the ability to build in decode-only or encode-only configurations. Change-Id: I3a5211428e539886ba998e09e8abd747ac55c9aa	2010-08-13 14:54:24 -04:00
Johann	633646b73b	update structure mode_info_context->mbmi no longer gets copied up a level Change-Id: Icd2d27d381909721326c34594a1ccdc26d48a995	2010-08-12 16:37:55 -04:00
Johann	1ec7981c34	remove unused definition asm_offsets contains some definitions which are no longer used. this was one of them. v6 build works now Change-Id: If370cfa8acd145de4fead2d9a11b048fccc090df	2010-08-12 16:37:55 -04:00
Scott LaVarnway	9c7a0090e0	Removed unnecessary MB_MODE_INFO copies These copies occurred for each macroblock in the encoder and decoder. Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD. As a result, a large number compile errors had to be fixed. Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3	2010-08-12 16:25:43 -04:00
Scott LaVarnway	f5615b6149	Merge "Finished vp8_sixtap_predict4x4_ssse3 function"	2010-08-11 12:23:24 -07:00
John Koleszar	392a958274	avoid negative array subscript warnings The mv_ref and sub_mv_ref token encodings are indexed from NEARESTMV and LEFT4X4, respectively, rather than being zero-based like the other token encodings. Change-Id: I3699c3f84111209ecfb91097c4b900773e9a3ad5	2010-08-11 13:49:12 -04:00
Scott LaVarnway	b07e5b6fa1	Finished vp8_sixtap_predict4x4_ssse3 function Added vp8_filter_block1d4_h6_ssse3 and vp8_filter_block1d4_v6_ssse3 assembly routines. Also removed unused assembly. Change-Id: I01c1021835f2edda9da706822345f217087ca0d0	2010-08-11 13:49:00 -04:00
Johann	c0ba42d3c0	rename DETOK_[AL] everything else uses lowercase detok Change-Id: I9671e2e90eb2961208dfa81c00b3accb5749ec04	2010-08-11 13:36:35 -04:00
Scott LaVarnway	99f46d62d9	Moved gf_active code to encoder only The gf_active code is only used by the encoder, so it was moved from common and decoder. Change-Id: Iada15acd5b2b33ff70c34668ca87d4cfd0d05025	2010-08-11 11:54:25 -04:00
Scott LaVarnway	e4fe866949	Added ssse3 version of sixtap filters Improved decoder performance by 9% for the clip used. Change-Id: I8fc5609213b7bef10248372595dc85b29f9895b9	2010-08-10 17:33:49 -04:00
John Koleszar	618c7d27a0	Mark loopfilter C functions as static Clang defaults to C99 mode, and inline works differently in C99. (gcc, on the other hand, defaults to a special gnu-style inlining, which uses different syntax.) Making the functions static makes sure clang doesn't decide to discard a function because it's too large to inline. Thanks to eli.friedman for the patch. Fixes http://code.google.com/p/webm/issues/detail?id=114 Change-Id: If3c1c3c176eb855a584a60007237283b0cc631a4	2010-08-09 09:36:44 -04:00
John Koleszar	cfb204eaf7	Merge "Issue 150: Fixing linker warning in extend.c."	2010-08-02 09:35:05 -07:00
Jan Kratochvil	0e8f108fb0	nasm: avoid space before the :data symbol type. global label:data ^^ Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I10f17eb1e4d4a718d4ebd1d0ccddc807c365e021	2010-08-02 09:20:42 -04:00
Frank Galligan	062e6c1886	Removed two unused global variables. Removed the global variables vp8_an and vp8_cd. vp8_an was causing problems because it was increasing the .bss by 1572864 bytes. Change-Id: I6c12e294133c7fb6e770c0e4536d8287a5720a87	2010-07-28 17:25:09 -04:00
Johann	56f5a9a060	update arm idct functions Jeff Muizelaar posted some changes to the idct/reconstruction c code. This is the equivalent update for the arm assembly. This shows a good boost on v6, and a minor boost on neon. Here are some numbers for highway in qcif, 2641 frames: HEAD neon: ~161 fps new neon: ~162 fps HEAD v6: ~102 fps new v6: ~106 fps The following functions have been updated for armv6 and neon: vp8_dc_only_idct_add vp8_dequant_idct_add vp8_dequant_dc_idct_add Conflicts: vp8/decoder/arm/armv6/dequantdcidct_v6.asm vp8/decoder/arm/armv6/dequantidct_v6.asm Resolved by removing these files. When I rewrote the functions, I also moved the files to dequant_dc_idct_v6.asm/dequant_idct_v6.asm Change-Id: Ie3300df824d52474eca1a5134cf22d8b7809a5d4	2010-07-26 08:55:19 -04:00
Justin Lebar	1d8277f8e8	Issue 150: Fixing linker warning in extend.c.	2010-07-23 16:42:25 -07:00
Jeff Muizelaar	98fcccfe97	Change the x86 idct functions to do reconstruction at the same time Change-Id: I896fe6f9664e6849c7cee2cc6bb4e045eb42540f	2010-07-23 15:21:36 -04:00
Jeff Muizelaar	b2fa74ac18	Combine idct and reconstruction steps This moves the prediction step before the idct and combines the idct and reconstruction steps into a single step. Combining them seems to give an overall decoder performance improvement of about 1%. Change-Id: I90d8b167ec70d79c7ba2ee484106a78b3d16e318	2010-07-23 15:21:36 -04:00
Fritz Koenig	0ce3901282	Swap alt/gold/new/last frame buffer ptrs instead of copying. At the end of the decode, frame buffers were being copied. The frames are not updated after the copy, they are just for reference on later frames. This change allows multiple references to the same frame buffer instead of copying it. Changes needed to be made to the encoder to handle this. The encoder is still doing frame buffer copies in similar places where pointer reference could be done. Change-Id: I7c38be4d23979cc49b5f17241ca3a78703803e66	2010-07-23 14:53:59 -04:00
Michael Kohler	1e23f45119	Fix misspelled "skiped" in onyxc_int.h to "skipped". Signed-off-by: Michael Kohler <michaelkohler@live.com>	2010-07-07 20:06:04 +02:00
Yunqing Wang	29d586b462	Add loopfilter initialization fix in multithreading code Modified loopfilter initialization to avoid unnecessary operations. Change-Id: I9fd1a5a49edc1cb8116c2a72a6908b1e437459ec	2010-06-30 09:42:39 -04:00
Yunqing Wang	bead039d4d	Improve SSE2 loopfilter functions Restructured and rewrote SSE2 loopfilter functions. Combined u and v into one function to take advantage of SSE2 128-bit registers. Tests on test clips showed a 4% decoder performance improvement on Linux desktop. Change-Id: Iccc6669f09e17f2224da715f7547d6f93b0a4987	2010-06-29 15:23:14 -04:00
Timothy B. Terriberry	9f81463454	Fix a linker error on x86-64 Linux when not using a version script. If the version script produced by the libvpx build system is not used when linking a shared library on x86-64 Linux, the constant data in the subpel filters produces R_X86_64_32 relocation errors due to the use of wrt rip addressing instead of wrt rip wrt ..gotpcrel. Instead of adding a new macro for this addressing mode, this patch sets the ELF visibility of these symbols to "hidden", which allows wrt rip addressing to work without a text relocation. This allows building a shared library without using the provided build system or a separate version script. Fixes http://code.google.com/p/webm/issues/detail?id=46 Change-Id: Ie108f9d9a4352e5af46938bf4750d2302c1b2dc2	2010-06-21 08:19:12 -04:00

... 3 4 5 6 7 ...

465 Commits