generic-library/vpx

Author	SHA1	Message	Date
Johann	a9b465c5c9	Merge "Add save/restore xmm registers in x86 assembly code"	2011-04-19 06:32:10 -07:00
Johann	c7cfde42a9	Add save/restore xmm registers in x86 assembly code Went through the code and fixed it. Verified on Windows. Where possible, remove dependencies on xmm[67] Current code relies on pushing rbp to the stack to get 16 byte alignment. This broke when rbp wasn't pushed (vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned memory accesses. Revisit this and the offsets in vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM. Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877	2011-04-18 16:30:38 -04:00
Yunqing Wang	48438d6016	Merge "Use sub-pixel search's SSE in mode selection"	2011-04-18 13:20:04 -07:00
Yunqing Wang	b8f0b59985	Use sub-pixel search's SSE in mode selection Passed SSE from sub-pixel search back to pick_inter_mode function, which is compared with the encode_breakout to see if we could skip evaluating the remaining modes. Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d	2011-04-18 16:12:28 -04:00
Johann	cd103a5721	Merge "store quant_shift as an unsigned char"	2011-04-18 10:03:40 -07:00
Yaowu Xu	c619f6cb0f	Merge "fixed an overflow in ssim calculation"	2011-04-18 07:44:34 -07:00
Adrian Grange	0d2abe3084	Merge "Fix usage of value returned by vp8_pick_intra4x4mby_modes"	2011-04-15 08:37:19 -07:00
Yunqing Wang	1312a7a2e2	Merge "Reduce unnecessary distortion computation"	2011-04-15 08:17:03 -07:00
Yunqing Wang	918fb5487e	Reduce unnecessary distortion computation In vp8_pick_inter_mode(), for NEWMV mode, use the error result got from motion search as distortion. This helps performance in real- time mode. Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c	2011-04-14 15:53:33 -04:00
John Koleszar	63f15987a5	Merge "Refactor lookahead ring buffer"	2011-04-14 12:35:01 -07:00
Fritz Koenig	e749ae510f	Merge "Use consistent delimiters."	2011-04-14 11:56:18 -07:00
Adrian Grange	8608de1c6f	Fix usage of value returned by vp8_pick_intra4x4mby_modes The value of distortion2 returned by vp8_pick_intra4x4mby_modes was being overwritten by the value returned by get16x16prederror before it was tested. Change-Id: If00e80332b272c5545c3a7e381c8041e8319b41a	2011-04-14 10:50:00 -07:00
Fritz Koenig	33cefd6f6e	Use consistent delimiters. opsnr.stt file was using \t for delimiters on everything except between VPXSSIM and Time. Change-Id: I6284c4e40c05ff642bf4b0170dca062c279a42df	2011-04-13 15:06:17 -07:00
Adrian Grange	8861174624	Fixed use of early breakout in vp8_pick_intra4x4mby_modes Index i is used to detect early breakout from the first loop, but its value is lost due to reuse in the second for loop. I moved the position of the second loop and did some format cleanup. Change-Id: I02780eae1bd89df4b6c000fb8a018b0837aac2e5	2011-04-13 12:56:46 -07:00
John Koleszar	88841f1059	Refactor lookahead ring buffer This patch cleans up the source buffer storage and copy mechanism to allow access through a standard push/pop/peek interface. This approach also avoids an extra copy in the case where the source is not a multiple of 16, fixing issue #102. Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9	2011-04-13 14:26:45 -04:00
Johann	70f30aa95d	store quant_shift as an unsigned char in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant only use 8 bits to store this, instead of 16. will allow saving an xmm register in an updated version of the regular quantize Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90	2011-04-13 13:50:12 -04:00
John Koleszar	c99f9d7abf	Change rc undershoot/overshoot semantics This patch changes the rc_undershoot_pct and rc_overshoot_pct controls to set the "aggressiveness" of rate adaptation, by limiting the amount of difference between the target buffer level and the actual buffer level which is applied to the target frame rate for this frame. This patch was initially provided by arosenberg at logitech.com as an attachment to issue #270. It was modified to separate these controls from the other unrelated modifications in that patch, as well as to use the pre-existing variables rather than introducing new ones. Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62	2011-04-12 20:49:33 -04:00
John Koleszar	538f110407	Merge "Bugfix for error accumulator stats"	2011-04-12 06:59:00 -07:00
John Koleszar	e689a27d62	Bugfix for error accumulator stats Previous to commit de4e9e3, there was an early return in the alt-ref case that was inadvertantly removed when the function was refactored to return void. This patch restores the prior behavior. Change-Id: I783ffd594a4690297e2742f99526fd7ad67698b2	2011-04-12 08:47:33 -04:00
Yunqing Wang	4fd81a99f8	Set cpu_used range to [-16, 16] in real-time mode Remove encoding speed limitation in real-time mode. Change-Id: Ib5e35d8bb522b2a25f3e4ad5cfe2788ebebb3617	2011-04-11 15:55:04 -04:00
Yunqing Wang	d1abe62d1c	Define RDCOST only once Clean up the code. Change-Id: I7db048efa4d972b528d553a7921bc45979621129	2011-04-11 11:53:56 -04:00
John Koleszar	a9ce3e3834	Remove unused files Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08	2011-04-11 10:34:40 -04:00
Yunqing Wang	4b43167ad1	Fix input MV for full search Input MV needs to be modified to full-pixel precision. Change-Id: Ic5d78e41bf27077e325024332b9fe89f76c44f0c	2011-04-08 16:29:41 -04:00
Johann Koenig	6e156a4cd7	Merge "use asm_offsets with vp8_fast_quantize_b_sse3"	2011-04-08 10:05:47 -07:00
John Koleszar	921a32a306	Merge "Error accumulator stats bug."	2011-04-08 08:20:32 -07:00
Paul Wilkins	de4e9e3b44	Error accumulator stats bug. The error accumulator stats values cpi->prediction_error and cpi->intra_error were being populated with rd values not distortion values. These are only "currently" used in a limited way for RT compress key frame detection. Change-Id: I2702ba1cab6e49ab8dc096ba75b6b34ab3573021	2011-04-08 14:21:36 +01:00
Jim Bankoski	d4cdb683a4	fixed an overflow in ssim calculation This commit fixed an overflow in ssim calculation, added register save and restore to make sure assembly code working for x64 platform. It also changed the sampling points to every 4x4 instead of 8x8 and adjusted the constants in SSIM calculation to match the scale of previous VPXSSIM. Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b	2011-04-07 14:25:25 -07:00
Johann Koenig	08702002e8	use asm_offsets with vp8_fast_quantize_b_sse3 on the same order as the sse2 fast quantize change: ~2% except for 32bit. only a slight improvment there. Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b	2011-04-07 16:40:05 -04:00
James Berry	aec5487cdd	Use correct 32 bit comparisons for SAD breakout. Rax updated to eax to avoid uninitialized memory usage. Change-Id: Iedb953f104329ede2a786fc648a47f1be2f3798a	2011-04-07 15:08:03 -04:00
Johann	2de858b9fc	Merge "use asm_offsets with vp8_fast_quantize_b_sse2"	2011-04-06 10:53:55 -07:00
Yunqing Wang	9e9f61a317	Merge "Minor modification"	2011-04-06 06:12:13 -07:00
Yunqing Wang	02423b2e92	Minor modification A small change. Change-Id: I2e7726e58370a95d0319361f4f6ad231138d1328	2011-04-06 09:08:47 -04:00
Johann	c32e0ecc59	use asm_offsets with vp8_fast_quantize_b_sse2 on the same order as the regular quantize change: ~2% Change-Id: I5c9eec18e89ae7345dd96945cb740e6f349cee86	2011-04-04 16:23:29 -04:00
Scott LaVarnway	f212a98ee7	Fixed unused variable warnings for firstpass.c Change-Id: I8378a9a541ade2f098359a7b20fa08e6c1596d80	2011-04-04 14:18:31 -04:00
Johann	610dd90288	Merge "tweak vp8_regular_quantize_b_sse2"	2011-04-04 08:56:25 -07:00
Yunqing Wang	f5c0d95e8c	Merge "Use full-pixel MV in mvsadcost calculation"	2011-04-04 08:40:51 -07:00
Yunqing Wang	3d6815817c	Use full-pixel MV in mvsadcost calculation MV sad cost error is only used in full-pixel motion search, which only need full-pixel resolution instead of quarter-pixel resolution. This change reduced mvsadcost table size, and removed unneccessary pamameter passing since this table is constant once it is generated. Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0	2011-04-01 16:41:58 -04:00
Johann	8520b5c785	tweak vp8_regular_quantize_b_sse2 rather than look up rc in the zig zag table, embed it in the macro. this also allows us to shuffle some values in the macro and keep *d in rsi gains of about the same order as the obj_int_extract implementation: ~2% Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d	2011-04-01 09:58:23 -04:00
Johann	ba11e24d47	Merge "Wrapper function removed from vp8_subtract_b_neon function call"	2011-04-01 05:47:21 -07:00
Tero Rintaluoma	cec76a36d6	Wrapper function removed from vp8_subtract_b_neon function call Address calculations moved from encodemb_arm.c file to neon optimized assembly function to save cycles in function calls. - vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon that contains all needed address calculations - unnecessary file encodemb_arm.c removed - consistent with ARMv6 optimized version Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b	2011-04-01 10:06:44 +03:00
Johann	9d138379a2	Merge "ARMv6 optimized subtract functions"	2011-03-31 08:40:10 -07:00
Attila Nagy	297b27655e	Runtime detection of available processor cores. Detect the number of available cores and limit the thread allocation accordingly. On decoder side limit the number of threads to the max number of token partition. Core detetction works on Windows and Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN. Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078	2011-03-31 10:23:01 +03:00
Attila Nagy	7d335868df	Fix: lpf semaphore was signaled in single threaded run After picking filter level, post the loopfilter semaphore just when multiple threads are in use. Change-Id: If7bfb64601d906adef703f454dafc25e978b93c6	2011-03-30 15:55:29 +03:00
Johann	0e43668546	Merge "Half pixel variance further optimized for ARMv6"	2011-03-29 12:14:54 -07:00
Yunqing Wang	534ea700bd	Merge "Fix a crash while enabling shared (--enable-shared)"	2011-03-29 09:04:22 -07:00
Yunqing Wang	b843aa4eda	Fix a crash while enabling shared (--enable-shared) Fixed a bug in SSSE3 sub-pixel filter functions. Change-Id: I2e2126652970eb78307ffcefcace1efd5966fb0a	2011-03-29 11:31:06 -04:00
Johann	f0c22a3f33	use GLOBAL correctly on 32bit shared libraries http://code.google.com/p/webm/issues/detail?id=309 Change-Id: I6fce9e2f74bc09a9f258df7f91ab599812324e8c	2011-03-29 11:27:03 -04:00
Tero Rintaluoma	6fdc9aa79f	ARMv6 optimized subtract functions Adds following ARMv6 optimized functions to encoder: - vp8_subtract_b_armv6 - vp8_subtract_mby_armv6 - vp8_subtract_mbuv_armv6 Gives 1-5% speed-up depending on input sequence and encoding parameters. Functions have one stall cycle inside the loop body on Cortex pipeline. Change-Id: I19cca5408b9861b96f378e818eefeb3855238639	2011-03-29 16:52:00 +03:00
Tero Rintaluoma	f5e433464b	Half pixel variance further optimized for ARMv6 Half pixel interpolations optimized in variance calculations. Separate function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On average, performance improvement is 6-7% for VGA@30fps sequences. Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628	2011-03-28 09:51:51 +03:00
Johann	beaafefcf1	Merge "use asm_offsets with vp8_regular_quantize_b_sse2"	2011-03-24 11:06:36 -07:00

1 2 3 4 5 ...

561 Commits