generic-library/vpx

Author	SHA1	Message	Date
Johann	b23bd2360f	The subfunctions are only defined for sse2 See highbd_subpel_variance_impl_sse2.asm Change-Id: Id13b97f4f6d189ed71cdc6d52b3c4ea63dc1da05	2016-05-06 18:58:49 -07:00
Johann	a761197fbd	Unlike non-hbd variance, opt2 is never used Change-Id: I1d342725df332c4efc6006d9e3dcb7372c41f448	2016-05-06 18:38:04 -07:00
Geza Lore	a0e1c23277	Add SSE2 versions of 128x128 vpx_sad* Encoder speedup with all experiments enabled approx 15%. Change-Id: Ib3c771d8da00989ddc9112b71b48ce7c5594e91a	2016-05-06 14:18:00 +01:00
James Zern	2184692c07	vpx_dsp/*.[hc]: add missing vpx_dsp_rtcd.h include Change-Id: I103be7eee36492f8619144ce8325bc916d4975c7	2016-05-04 15:06:44 -07:00
James Bankoski	89f905e5e5	Merge "libvpx: add a unit test for plane_add_noise."	2016-05-04 13:09:05 +00:00
Jim Bankoski	34d5aff747	libvpx: add a unit test for plane_add_noise. In so doing this fixes a couple of bugs: vpx_plane_add_noise.c needed to subtract a clamp instead of add. And the assembly (mmx sse) had assumptions that parameters were continuous in memory which was not true. Change-Id: I76f2c43cf54bfc838eb2edf8a443eaaa7565d7b5	2016-05-03 16:23:06 -07:00
James Bankoski	e755a283dd	Merge "Move vpx_add_plane from codec to vpx_dsp and dedup."	2016-05-03 14:11:57 +00:00
Jim Bankoski	fce3cee8dd	Move vpx_add_plane from codec to vpx_dsp and dedup. Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7	2016-05-02 12:17:39 -07:00
Yi Luo	299c5fc202	HBD hybrid transform 8x8 SSE4.1 optimization - Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Update bit-exact unit test against current C version. - HBD encoder speed improves ~3.8%. Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec	2016-04-29 17:04:52 -07:00
Alex Converse	a68b24fdee	Tweak casts on vpx_sub_pixel_variance to avoid implicit overflow. Change-Id: I481eb271b082fa3497b0283f37d9b4d1f6de270c	2016-04-27 16:37:18 -07:00
Alex Converse	6c4007be1c	Be explicit about overflow in vpx_variance16x16_sse2. The product always fits in uint32_t, but the operands don't. An optimizing compiler should generate the wraparound code. (Verified with clang). Change-Id: I25eb64df99152992bc898b8ccbb01d55c8d16e3c	2016-04-27 15:22:17 -07:00
Alex Converse	ccb894ce73	Remove casts on < 16x16 variance. These blocks will never overflow since max sum is +/-255wh. Change-Id: Ia2c630339fd9cfb411b56b6040ff402095f12a2e	2016-04-27 15:21:58 -07:00
Yaowu Xu	ed04e82a04	Merge branch 'master' into nextgenv2 Conflicts: vp10/common/scan.c vp9/common/vp9_pred_common.c vp9/decoder/vp9_decoder.c Change-Id: Id559d98ea676da15d60ed464ddb6c48d3eed1111	2016-04-18 15:15:05 -07:00
Yi Luo	6db95602e4	Merge "Optimized HBD block subtraction for all block sizes" into nextgenv2	2016-04-12 21:22:32 +00:00
Yi Luo	0f80b1f754	Optimized HBD block subtraction for all block sizes - Interface function takes a local MxN function to call based on the block size. - Repetition call (w/o cache line miss) shows improvement: ~63% - ~340%. - Overall encoder speed improvement: ~0.9%. Change-Id: Ieff8f3d192415c61d6d58d8b99bb2a722004823f	2016-04-12 12:04:43 -07:00
Yi Luo	e5f4e8eab9	Some cosmetic improvements since HBD variance 4x4 optimization Change-Id: I414c1fabd2e3a9b1d9daa8a90f85a0bace8bd3cd	2016-04-08 10:32:13 -07:00
James Zern	38bc1d0f4b	vpx_fdct16x16_1_sse2: improve load pattern load the full row rather than doing 2 8-wide columns Change-Id: I7a1c0cba06b0dc1ae86046410922b1efccb95c95	2016-04-04 16:03:42 -07:00
James Zern	3735def667	vpx_fdctNxN_1_sse2: reduce store size only output[0] needs to be set, store_output is more involved than a movdqa in the high bitdepth case Change-Id: I2cbd85d7cf74688bdf47eb767934fe42e02bff67	2016-04-04 16:02:06 -07:00
Yi Luo	250935cab3	Optimized HBD 4x4 variance calculation vpx_highbd_8/10/12_variance4x4_sse4_1 improves performance ~7%-11%. Change-Id: Ida22bb2a2f7a58037cfd73e186d4f6267a960c02	2016-04-04 11:28:59 -07:00
Geza Lore	552d5cd715	Extend superblock size fo 128x128 pixels. If --enable-ext-partition is used at build time, the superblock size (sometimes also referred to as coding unit (CU) size) is extended to 128x128 pixels. Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a	2016-03-30 18:23:06 +01:00
Yaowu Xu	c810740c36	Merge branch 'masterbase' into nextgenv2 Conflicts: vp9/encoder/vp9_encoder.c vpx_dsp/x86/convolve.h Change-Id: I60c3532936bedd796a75dfe78245a95ec21e2e55	2016-03-28 17:44:28 -07:00
Yunqing Wang	5f5552d846	Optimize HBD up-sampled prediction functions Optimized 2 up-sampled reference prediction functions in high-bit depth case. This reduced the HBD encoding time by 3%. Change-Id: I8663ffb5234f5e70168c0fc9ca676309fe8e98f2	2016-03-14 19:04:33 -07:00
Yunqing Wang	e6e2d886d3	Add high-precision sub-pixel search as a speed feature Using the up-sampled reference frames in sub-pixel motion search is enabled as a speed feature for good-quality mode speed 0 and speed 1. Change-Id: Ieb454bf8c646ddb99e87bd64c8e74dbd78d84a50	2016-03-11 16:32:11 -08:00
Scott LaVarnway	67c4c8244a	VPX: loopfilter_mmx.asm using x86inc 2 This reverts commit 9aa083d164e0d39086aa0c83f0d1a0d0f0d1ba61. Fixes a decoder mismatch with 32bit PIC builds. Change-Id: I94717df662834810302fe3594b38c53084a4e284	2016-03-08 04:24:47 -08:00
Geza Lore	938b8dfc73	Extend convolution functions to 128x128 for ext-partition. Change-Id: I7f7e26cd1d58eb38417200550c6fbf4108c9f942	2016-03-07 11:39:27 +00:00
James Zern	9aa083d164	Revert "VPX: loopfilter_mmx.asm using x86inc" This reverts commit 15ecdc3970462c15fdf7185d373cb52664f40c0f. breaks 32-bit pic builds Change-Id: I8bb1b9471a293f05ac7423aaba0339d408931b7a	2016-03-04 18:23:45 -08:00
Geza Lore	697bf5beff	Add 128 pixel variance and SAD functions Change-Id: I8fde245b32c9e586683a28aa6925da0b83850b39	2016-03-03 10:24:29 +00:00
Debargha Mukherjee	1d69ceee5c	Adds masked variance and sad functions for wedge Adds masked variance and sad functions needed for wedge prediction modes to come. Change-Id: I25b231bbc345e6a494316abb0a7d5cd5586a3a54	2016-03-01 17:28:56 -08:00
Yunqing Wang	342a368fd4	Do sub-pixel motion search in up-sampled reference frames Up-sampled the reference frames to 8 times in each dimension using the 8-tap interpolation filter. In sub-pixel motion search, use the up-sampled reference frames to find the best matching blocks. This largely improved the motion search precision, and thus, improved the compression quality. There was no change in decoder side. Borg test and speed test results: 1. On derflr set, Overall PSNR gain: 1.306%, and SSIM gain: 1.512%. Average speed loss on derf set was 6.0%. 2. On stdhd set, Overall PSNR gain: 0.754%, and SSIM gain: 0.814%. On hevchd set, Overall PSNR gain: 0.465%, and SSIM gain: 0.527%. Speed loss on HD clips was 3.5%. Change-Id: I300ebaafff57e88914f3dedc8784cb21d316b04f	2016-02-29 12:14:47 -08:00
Scott LaVarnway	dd6729f826	VPX: Remove pmin/pmax from subpixel functions. These instructions are unnecessary if the adds are done in the correct order. Change-Id: I4e533b8267c32e610a4b94203ad052dc9fdabd71	2016-02-27 05:47:56 -08:00
Scott LaVarnway	51beb29f52	Merge "VPX: vpx_filter_block1d16_(v8, v8_avg)"	2016-02-27 13:31:18 +00:00
James Zern	654d2163c9	x86/convolve.h: remove redundant check in FUN_CONV_2D the filter will be the same in this case Change-Id: I95159bcb05bbfb71b57da741393e80cc7ffc5cff	2016-02-25 23:31:50 -08:00
James Zern	6d8c8c6201	x86/convolve.h: replace while w/if for w < 16 in non-hbd configurations; any high-bitdepth changes will be done in a follow-up Change-Id: Ia74e30971b744c1faab68c92fdeda1a053988c77	2016-02-25 21:44:06 -08:00
Scott LaVarnway	1f736e400f	VPX: vpx_filter_block1d16_(v8, v8_avg) Store result with one 16 byte store instead of two 8 byte stores. Change-Id: I43acbc5edfd6d6055a926f9b9605d47127400f09	2016-02-25 06:15:24 -08:00
James Zern	b3ceb629ba	x86/convolve.h: change filter[] \|\| chains to \| Change-Id: I661f64390f232826857b259e7a67e77f5a3a91ad	2016-02-24 19:47:43 -08:00
Yaowu Xu	aa6c754635	Merge remote-tracking branch 'webm/master' into nextgenv2	2016-02-24 10:53:17 -08:00
Scott LaVarnway	06d0e2fe6c	BUG FIX: vpx_filter_block1d(8,4)_(v8, v8_avg) Change-Id: Ic7ea79988ed0864e7ddbfeb312516bcf77eaaac1	2016-02-23 12:23:41 -08:00
Scott LaVarnway	15ecdc3970	VPX: loopfilter_mmx.asm using x86inc Change-Id: Idcf29281d617b275e3ca50f77e6d00c60992a36d	2016-02-18 15:34:58 -08:00
James Zern	9b44d9d00f	split vpx_highbd_lpf_horizontal_16 in two replace with vpx_highbd_lpf_horizontal_edge_16 and vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d	2016-02-16 23:13:58 -08:00
James Zern	1b519fb666	split vpx_lpf_horizontal_16 in two replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to avoid passing a count parameter Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271	2016-02-16 22:57:45 -08:00
James Zern	e7a23d703b	vpx_highbd_lpf_horizontal_4: remove unused count param Change-Id: I655a771e1b1a8753be5669ef9348a312ba6cfdbc	2016-02-16 22:57:45 -08:00
James Zern	5171857329	vpx_highbd_lpf_horizontal_8: remove unused count param Change-Id: Iaca71ea3796115d4c2d43563b4e6f3914e21f1bf	2016-02-16 22:57:44 -08:00
James Zern	3c1019e49d	vpx_highbd_lpf_vertical_4: remove unused count param Change-Id: Ic6da723c5cf3cd8127db1f476c3e46ea134cb774	2016-02-16 22:57:44 -08:00
James Zern	72a9f06ac2	vpx_highbd_lpf_vertical_8: remove unused count param Change-Id: Id16f7259897654831d31642c2d5e0bbe5e13416c	2016-02-16 22:57:44 -08:00
James Zern	b1e97c6a25	vpx_lpf_horizontal_4: remove unused count param Change-Id: Iec7d8eda343991f7d7d46931dca17af23c821d11	2016-02-16 22:57:27 -08:00
James Zern	bd5a5bb561	vpx_lpf_horizontal_8: remove unused count param Change-Id: I48741e167a7b09b7c9ad3bfc1c4b88ef1029ae46	2016-02-16 22:54:40 -08:00
James Zern	109a47b342	vpx_lpf_vertical_4: remove unused count param Change-Id: I43a191cb3d42e51e7bca266adfa11c6239a8064c	2016-02-16 14:59:00 -08:00
James Zern	37225744db	vpx_lpf_vertical_8: remove unused count param Change-Id: Ic69406da00afb0f06588e8c0deb2b043952b078c	2016-02-16 14:59:00 -08:00
Geza Lore	abd00505d1	Add optimized vpx_sum_squares_2d_i16 for vp10. Using this we can eliminate large numbers of calls to predict intra, and is also faster than most of the variance functions it replaces. This is an equivalence transform so coding performance is unaffected. Encoder speedup is approx 7% when var_tx, super_tx and ext_tx are all enabled. Change-Id: I0d4c83afc4a97a1826f3abd864bd68e41bb504fb	2016-02-15 16:54:52 +00:00
Yaowu Xu	0aef1bc898	Enable sse2 version of inverse wht for hbd build Change-Id: If8f5efd701a11c8a7ad3078d10ec3cd0fe27667e	2016-01-29 14:47:56 -08:00

1 2 3 4 5

229 Commits