generic-library/vpx

Author	SHA1	Message	Date
Jian Zhou	8366b414dd	Code clean of highbd_dc_predictor_4x4 MMX replaced with SSE2, same performance. Change-Id: Ic57855254e26757191933c948fac6aa047fadafc	2015-12-18 12:45:23 -08:00
Jian Zhou	789dbb3131	Code clean of sad4xNx4D_sse Replace MMX with SSE2. Change-Id: I948ca1be6ed9b8e67f16555e226f1203726b7da6	2015-12-17 17:43:46 -08:00
Jian Zhou	b158d9a649	Code clean of sad4xN(_avg)_sse Replace MMX with SSE2, reduce psadbw ops which may help Silvermont. Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d	2015-12-17 11:10:42 -08:00
James Zern	b81f04a0cc	Merge "move vp9_avg to vpx_dsp"	2015-12-15 03:41:22 +00:00
James Zern	d36659cec7	move vp9_avg to vpx_dsp Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f	2015-12-14 14:42:12 -08:00
Jian Zhou	88120481a4	Code clean of tm_predictor_32x32 Reallocate the xmm register usage so that no ARCH_X86_64 required. Reduce memory access to the left neighbor by half. Speed up by single digit on big core machine. Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990	2015-12-11 10:32:08 -08:00
Jian Zhou	c90a8a1a43	SSE2 based h_predictor_32x32 Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8, and reduce mem access to left. Speed up by single digit in ./test_intra_pred_speed on big core machines. Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960	2015-12-10 10:09:58 -08:00
Jacky Chen	d9bba21306	Merge "Add vp9_avg_4x4_neon and the unit test."	2015-12-09 06:09:33 +00:00
James Zern	3dc19feb29	Merge changes Id3c6cf5c,I7970575e,If3253a87 * changes: test.mk: simplify vp8/9 checks test.mk: regroup white box tests test.mk: enable test_intra_pred_speed unconditionally	2015-12-09 01:39:45 +00:00
jackychen	303f144eef	Add vp9_avg_4x4_neon and the unit test. Change-Id: I3ef9a9648841374ed3cc865a02053c14ad821a20	2015-12-08 17:23:36 -08:00
Jian Zhou	aa5b517a39	Re-enable SSE2 based intra 4x4 prediction 4x4 Intra predictor implemented with MMX is replaced with SSE2. Segfault in change 315561 when decoding vp8 is taken care of. Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76	2015-12-07 18:50:37 -08:00
James Zern	79a9add666	Revert "MMX in intra 4x4 prediction replaced with SSE2" This reverts commit 89a1efa4c436c58c101c8b3de866e3014be7d77a. This causes a segfault when decoding vp8, in both 32 and 64-bit Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367	2015-12-05 10:20:39 -08:00
James Zern	a046ba21d8	test.mk: simplify vp8/9 checks use CONFIG_VP[89] to protect white-box tests and drop redundant uses of CONFIG_VP9 in variable assignments within that block Change-Id: Id3c6cf5c7822aa161b19768b295f58829a1c6447	2015-12-04 18:44:45 -08:00
James Zern	2c9c2e0b8b	test.mk: regroup white box tests vp8/9/10/multi-config/unconditional Change-Id: I7970575e997da0b68c6c54741a221fbba5ad0b08	2015-12-04 18:44:34 -08:00
Jian Zhou	e86c7c863e	Speed up h_predictor_16x16 Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4, and reduce mem access to left. Speed up by >20% in ./test_intra_pred_speed. Change-Id: Ie48229c2e32404706b722442942c84983bda74cc	2015-12-04 12:12:55 -08:00
Jian Zhou	da3f08fac3	Speed up h_predictor_8x8 Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2, and reduce mem access to left. Speed up by >20% in ./test_intra_pred_speed. Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e	2015-12-04 11:36:44 -08:00
Jian Zhou	aa2764abdd	MMX in intra 8x8 prediction replaced with SSE2 8x8 Intra predictor implemented with MMX is replaced with SSE2. Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52	2015-12-03 18:11:06 -08:00
Jian Zhou	89a1efa4c4	MMX in intra 4x4 prediction replaced with SSE2 4x4 Intra predictor implemented with MMX is replaced with SSE2. Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc	2015-12-03 16:40:23 -08:00
Jian Zhou	623e988add	Merge "SSE2 speed up of h_predictor_4x4"	2015-12-02 18:49:00 +00:00
Jian Zhou	9d29d76280	SSE2 speed up of h_predictor_4x4 Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers. Speed up by ~25% in ./test_intra_pred_speed. Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d	2015-11-30 10:08:05 -08:00
James Zern	1138b986c9	test.mk: enable test_intra_pred_speed unconditionally vpx_dsp is currently included in all configurations Change-Id: If3253a87d27f3e1abc94fbfe76f978c1172f3762	2015-11-24 22:29:12 -08:00
James Zern	fd51d90159	Merge changes Iaf8cbe95,I6748183d,I2a49811d * changes: add vp9_satd_neon fix vp9_satd_sse2 vp9_satd: return an int	2015-11-25 01:48:53 +00:00
Alex Converse	022c848b4d	Change highbd variance rounding to prevent negative variance. Always round sum error and sum square error toward zero in variance calculations. This prevents variance from becoming negative. Avoiding rounding variance at all might be better but would be far more invasive. Change-Id: Icf24e0e75ff94952fc026ba6a4d26adf8d373f1c	2015-11-24 16:32:01 -08:00
James Zern	eb1d0f8d60	add vp9_satd_neon ~60-65% faster at the function level across block sizes Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893	2015-11-24 16:09:10 -08:00
Marco	b0027b96ae	vp9-svc: Fix to allow setting qp-max/min per spatial and temporal layer. Change-Id: Ic0ec32c1d7f7c08c9f956592dccbfd9060b1f624	2015-11-23 10:46:34 -08:00
James Zern	60760f710f	fix vp9_satd_sse2 accumulate satd in 32-bits + add unit test Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5	2015-11-20 14:35:46 -08:00
James Zern	3e0138edb7	vp9_satd: return an int the final sum may use up to 26 bits + add a unit test + disable the sse2 as the result will rollover; this will be fixed in a future commit Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce	2015-11-20 14:35:38 -08:00
Jian Zhou	4993158ee5	Merge "Speed up tm_predictor_4x4"	2015-11-19 02:32:48 +00:00
Jian Zhou	79b68626ae	Speed up tm_predictor_4x4 tm_predictor_4x4 is implemented with SSE2 using XMM registers. Speed up by ~25% in ./test_intra_pred_speed. Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0	2015-11-18 16:44:25 -08:00
jackychen	204cde580a	Enable resize test(down&up) by changing the bitrate. Change-Id: I5a4f1f7b9de20fbfc28cb743dcd29c0eeca736f8	2015-11-13 16:46:00 -08:00
Marco	006fd19246	Fix resize internal test. Temporary fix to make sure it always passes. Change-Id: I56a0529986ad7049b6090f871c14e9e06d573d5f	2015-11-13 06:22:27 -08:00
Marco	419da5c734	Adjust variance threshold for 16x16 split at low resolutions. Change-Id: I635e37f81237e9703d7d9a11ed76a043f4ec6eb0	2015-11-12 17:58:31 -08:00
jackychen	55c8843791	VP9: add unit test for realtime external resize. Change-Id: I9bfa80de73847d9be88b6ce9865d7bb5fafaaa57	2015-11-09 16:48:18 -08:00
jackychen	0465aa45ea	VP9 dynamic resize: enable resize unit test(DownUp). The unit test requires a longer clip which is already in the repo. Change-Id: Ic42e8d83e636fafd20d485a7f5f8422835319245	2015-11-09 14:04:58 -08:00
jackychen	3c9a424e6e	VP9 dynamic resize: increase waiting time after key frame. For 1 pass CBR mode: increase waiting time after key frame before we start sampling rate control behavior for determining resize. This change need to disable one internal resize(DownUp) temporally since it requires a longer clip to do so. Change-Id: If21beda1be23f169ee541ab4dd642f718347887a	2015-11-09 12:04:00 -08:00
James Zern	837cea40fc	variance_test: create fn pointers w/'&' ref this helps some toolchains (vs9) resolve the type of the parameter Change-Id: I8c83b86da53b1783cd18c0f765b67ba33da91d72	2015-11-06 11:04:11 -08:00
James Zern	ab5ce2e5ae	sixtap_predict_test: create fn pointers w/'&' ref this helps some toolchains (vs9) resolve the type of the parameter Change-Id: Ic53b2ed5fbce05c5b5e633b4a4ef9ea75c55360a	2015-11-06 11:04:10 -08:00
James Zern	91606bbbe6	sad_test: create fn pointers w/'&' ref this helps some toolchains (vs9) resolve the type of the parameter Change-Id: I4acc8a844d1e55b766f66482bd6d32998174d70f	2015-11-05 23:53:24 -08:00
James Zern	892130f75b	vp9_spatial_svc_encoder.sh: fix command line param -l -> -sl, renamed in: be3b08d [svc] Temporal svc with two pass rate control Change-Id: I5a7b179b33d94e20e54825090659156dece928c0	2015-11-05 15:22:39 -08:00
Marco	cb7b2a4f4b	Adjust threshold for datarate frame drop test. Current threshold is little too strict. Change-Id: I99ec1409d095e0c2fd3b7ab398742cabcc05700b	2015-11-03 08:17:21 -08:00
James Zern	ca163b85bb	vp9_dx_iface: move struct defs to separate header this avoids redefining vpx_codec_vp9_dx, vpx_codec_vp9_dx_algo in vp9_encoder_parms_get_to_decoder.cc Change-Id: I3b89e7a62497227ee32419f1a7d30e4c10a13c05	2015-10-29 17:55:35 -07:00
jackychen	d464e8a462	VP9 decoder: Add more test vectors for resizing. Refer to doc "vp9-test-vectors". BUG=https://code.google.com/p/webm/issues/detail?id=1086 Change-Id: I523d1f39141a3a86f113604cbdb9cd41cc2d6470	2015-10-28 21:26:00 -07:00
Hangyu Kuang	bd45af8bbb	Add more resize test videos that with larger resolution change intervals. These videos change resolution every 10 frames versus every 3 frames in current test sets. Change-Id: Ic33f449fc9b6d2f480825d4715b8f63e70801232	2015-10-28 10:57:30 -07:00
Hangyu Kuang	f5f19a1fbd	Merge "Add several new test vectors with small resolution."	2015-10-28 15:04:25 +00:00
Hangyu Kuang	0771a30e9e	Add several new test vectors with small resolution. Change-Id: I70b1b8162a0c9b8501358ba7d32fecd1dc020ab5	2015-10-27 17:46:48 -07:00
Debargha Mukherjee	35cae7f1b3	Merge "Optimize vp9_highbd_block_error_8bit assembly."	2015-10-26 18:03:46 +00:00
Ronald S. Bultje	aa11256555	Adjust superframe-is-optional unit test for vp10 superframe syntax. Change-Id: Ic64b6928af7ae8ecc987f845b0bf0faecdacb072	2015-10-21 22:27:28 -04:00
Geza Lore	aa8f85223b	Optimize vp9_highbd_block_error_8bit assembly. A new version of vp9_highbd_error_8bit is now available which is optimized with AVX assembly. AVX itself does not buy us too much, but the non-destructive 3 operand format encoding of the 128bit SSEn integer instructions helps to eliminate move instructions. The Sandy Bridge micro-architecture cannot eliminate move instructions in the processor front end, so AVX will help on these machines. Further 2 optimizations are applied: 1. The common case of computing block error on 4x4 blocks is optimized as a special case. 2. All arithmetic is speculatively done on 32 bits only. At the end of the loop, the code detects if overflow might have happened and if so, the whole computation is re-executed using higher precision arithmetic. This case however is extremely rare in real use, so we can achieve a large net gain here. The optimizations rely on the fact that the coefficients are in the range [-(2^15-1), 2^15-1], and that the quantized coefficients always have the same sign as the input coefficients (in the worst case they are 0). These are the same assumptions that the old SSE2 assembly code for the non high bitdepth configuration relied on. The unit tests have been updated to take this constraint into consideration when generating test input data. Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7	2015-10-21 12:30:40 +01:00
Yaowu Xu	568429512e	Add a new enum type vpx_color_range_t to make meaning of color_range obvious. Change-Id: I303582e448b82b3203b497e27b22601cc718dfff	2015-10-16 16:27:18 -07:00
Alex Converse	0c00af126d	Add vpx_highbd_convolve_{copy,avg}_sse2 single-threaded: swanky (silvermont): ~1% faster overall peppy (celeron,haswell): ~1.5% faster overall Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073	2015-10-09 11:50:25 -07:00

... 4 5 6 7 8 ...

1606 Commits