generic-library/vpx

Author	SHA1	Message	Date
levytamar82	efdfdf5787	32 Align Load bug In the sub_pixel_avg_variance the parameter sec was also aligned load and changed to unaligned. Change-Id: I4d4966e0291059ea4d705baed1503dc58444fcb7	2014-08-14 14:07:28 -07:00
levytamar82	69a5f5ecf7	Fix bug 807 in the sub_pixel_variance function the dst is aligned to 16 bytes and not to 32 bytes - now load unaligned data Change-Id: I2e0b9745543697efc56fefa32857ea10117af135	2014-08-07 18:51:02 -07:00
levytamar82	af10457e02	Fix bug 806 in the function sad32x32x4d and sad64x64x4d the source is aligned to 16 bytes and not to 32 bytes - the load is now unaligned. Change-Id: I922fdba56d0936b5cf72e4503519f185645a168c	2014-08-07 14:13:30 -07:00
levytamar82	4ba92dc5ab	Fix bug 805 Remove all the redundant dct functions (dct4x4, dct8x8) in avx2 except dct32x32 those functions were copied originally from dct_sse2 Change-Id: I742576fbf5175f3ac09f2076976a9247b259323e	2014-07-28 15:46:01 -07:00
Jingning Han	9ad1b9fc67	Re-design quantization process for 32x32 transform block This commit enables a new quantization process for 32x32 2D-DCT transform coefficient blocks. It improves the compression performance of speed 5 by 1.4%. The overall compression gains of speed 5 due to the new quantization scheme is 4.7%. It also includes the SSSE3 implementation of the 32x32 quantization process. Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b	2014-07-08 16:55:28 -07:00
Jingning Han	00fc0e3ff5	Tune SSSE3 implementation of fast path quantization This commit further simplifies the SSSE3 implementation of the fast path quantization process. Change-Id: I5be3286ec0f1bd81d1cf5be3168fece6384fb9ca	2014-07-07 11:06:53 -07:00
Jingning Han	9ac2f66320	Re-design quantization process This commit re-designs the quantization process for transform coefficient blocks of size 4x4 to 16x16. It improves compression performance for speed 7 by 3.85%. The SSSE3 version for the new quantization process is included. The average runtime of the 8x8 block quantization is reduced from 285 cycles -> 255 cycles, i.e., over 10% faster. Change-Id: I61278aa02efc70599b962d3314671db5b0446a50	2014-07-01 17:00:07 -07:00
Jingning Han	d5ae43318e	Merge "Fast computation path for forward transform and quantization"	2014-06-12 11:59:52 -07:00
Jingning Han	ccba289f8d	Fast computation path for forward transform and quantization This commit enables a fast path computational flow for forward transformation. It checks the sse and variance of prediction residuals and decides if the quantized coefficients are all zero, dc only, or more. It then selects the corresponding coding path in the forward transformation and quantization stage. It is currently enabled in rtc coding mode. Will do it for rd coding mode next. In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up. Overall coding performance for rtc set is changed by -0.18%. Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1	2014-06-12 11:10:54 -07:00
Dmitry Kovalev	e6fadb5ba8	Merge "Cleaning up vp9_variance_mmx.c."	2014-06-10 17:27:12 -07:00
Jingning Han	540d910350	Fix potential overflow issue in SSSE3 forward 8x8 2D-DCT The SSSE3 implementation might find a potential overflow issue in its second 1-D transform, if all input residual pixels are close to 255. This commit fixes the issue and re-enables the unit test on the SSSE3 version. Change-Id: I0520478abdab7afd3ff2842516bec951111e9b3c	2014-06-03 14:21:47 -07:00
Yaowu Xu	d553cc10dc	Merge "Fixed a crash windows build"	2014-05-29 08:16:19 -07:00
Yaowu Xu	43414f3f7b	Fixed a crash windows build Change-Id: I58baa1da1f3bfc8a6da454399139fe6a7473ff10	2014-05-28 15:50:50 -07:00
Dmitry Kovalev	ac3d97f124	Cleaning up vp9_variance_mmx.c. Change-Id: I42d83f91e272c92daed604c233f74439fe6307c5	2014-05-28 12:03:55 -07:00
Dmitry Kovalev	a789bfec87	Cleaning up vp9_variance_sse2.c. Change-Id: I5ec336848f6489c31cf2b645026fa2025db07466	2014-05-27 13:53:19 -07:00
Dmitry Kovalev	72ab966d5e	Removing vp9_pragmas.h. Change-Id: I9120a87e27e73e496932d11716937e2fad246521	2014-05-22 13:46:31 -07:00
Deb Mukherjee	b59b324171	Merge "Renames x86_64 specific asm files"	2014-05-22 12:30:38 -07:00
Deb Mukherjee	e272273443	Renames x86_64 specific asm files Renames all x86_64 specific assembly files to consistently end in _x86_64.asm. This will be useful for build systems to handle these files differently. All new 64-bit specific assembly files should use the new naming convention. Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536	2014-05-21 13:55:56 -07:00
Jingning Han	d8b26caa71	Merge "Adjust the forward 16x16 DCT computation steps"	2014-05-21 09:16:04 -07:00
Deb Mukherjee	a185bc3350	Extends temporal filtering to work for 422 data This is needed for profiles 1 and 2. Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028	2014-05-20 15:19:40 -07:00
Jingning Han	7f547336b7	Adjust the forward 16x16 DCT computation steps This commit adjusts the forward 16x16 DCT computation steps to simplify the register level operations. It fixes the corresponding sse2 version accordingly. Change-Id: I72a9c25b8ca9442fc5e113f47cd701ae55aa7f08	2014-05-19 12:39:26 -07:00
Yunqing Wang	c661cf0dad	Merge "AVX2 To VP9 Block Error Optimization"	2014-05-15 11:29:29 -07:00
levytamar82	1fbab853c8	AVX2 To VP9 Block Error Optimization vp9_block_error_sse2 can only handle 16 bytes at a time but the function requires to handle a sequence of 32 bytes at a time so each 16 bytes is handled in a different register. With AVX2 optimization the 32 bytes can be handled in one register instead of two in the SSE2 The vp9_block_error was optimized by 85%. The user level was optimized by 1.2% Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd	2014-05-14 11:51:07 -07:00
Alex Converse	b5422fab46	Add an x86inc MMX fwht4x4. Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197	2014-05-08 12:01:27 -07:00
Dmitry Kovalev	68a600d82a	Merge "Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c."	2014-05-07 13:34:05 -07:00
Paul Wilkins	33b1c457ed	Revert "Add an MMX fwht4x4" Includes changes that are not compatible with VS windows builds. Amongst other things stdint.h is not supported in VS. This reverts commit `89fbf3de50`. Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd	2014-05-07 12:53:27 +01:00
Alex Converse	75d05d5ed4	Merge "Add an MMX fwht4x4"	2014-05-06 11:12:27 -07:00
Alex Converse	89fbf3de50	Add an MMX fwht4x4 7% faster encoding a desktop lossless at RT speed 4. Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64	2014-05-05 15:10:48 -07:00
Jingning Han	52ae97b6aa	SSSE3 implementation of full inverse 8x8 2D-DCT This commit enables SSSE3 version full inverse 8x8 2D-DCT and reconstruction. It makes the runtime of vp9_idct8x8_64_add down from 256 cycles (SSE2) to 246 cycles. Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3	2014-05-05 10:49:27 -07:00
Dmitry Kovalev	25a666ef39	Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c. Change-Id: I642a7d343677bf934e9a54cf4ad78e908620e39a	2014-05-01 16:45:49 -07:00
Dmitry Kovalev	e05b92c0aa	Merge "Removing half-variance asm functions which are not used."	2014-05-01 14:50:45 -07:00
Jingning Han	39761eb5d6	Merge "Enable SSSE3 implementation of 8x8 forward 2D-DCT"	2014-04-30 13:41:36 -07:00
Dmitry Kovalev	94f5491c46	Removing half-variance asm functions which are not used. Corresponding C functions were removed in I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3 Change-Id: I50a5575065a7a9e41904eb2161afd739def927db	2014-04-30 12:21:54 -07:00
Jingning Han	1eaa3a76dc	Enable SSSE3 implementation of 8x8 forward 2D-DCT Assembly implementation of ssse3 8x8 forward 2D-DCT. The current version is turned on only for x86_64. The average unit runtime goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster. This translates into about 1.5% speed-up for pedestrian_area 1080p at speed 2. Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4	2014-04-29 15:49:18 -07:00
Dmitry Kovalev	6e01079cc0	Removing unused vp9_variance_halfpixvar*() functions. Change-Id: I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3	2014-04-25 11:50:07 -07:00
Dmitry Kovalev	2fc3a18653	Removing unused vp9_mcomp_x86.h file. We don't use declarations from this file. The real declarations (differently named) are in vp9_rtcd_defs.pl, e.g. vp9_full_search_sad. Change-Id: I73cbf064305710ba20747233cfdbe67366f069a0	2014-04-14 11:32:58 -07:00
levytamar82	0fa8b668c1	AVX2 SAD Optimization: 2 functions were optimized for avx2 by using full 256 bit register In order to handle 32 elements in parallel instead of only 16 in parallel: 1. vp9_sad32x32x4d 2. vp9_sad64x64x4d The function level gain is 66% and the user level gain is ~1%. Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb	2014-03-21 13:53:32 -07:00
Yaowu Xu	5511968f21	Removed several unused functions. Change-Id: Ib9e27298c575afc02a98b593bc6ad60762064d9b	2014-03-17 14:09:29 -07:00
Andrew Russell	e337322e63	Merge "improved speed of 4x4 sse2 fdct."	2014-03-05 14:35:44 -08:00
Andrew Russell	a46f5459c3	improved speed of 4x4 sse2 fdct. * speed improvment of 30 percent achieved * multiplies and adds remain the same * non-arithmetic instructions minimized by hand, by: -expanding 2 pass loop -removing irrelivant "shuffles" -combining last two rounding steps * further improvments may be possible Change-Id: Idec2c3f52910c48e6a0e0f9aefed5cae31b0b8c0	2014-03-03 14:25:42 -08:00
levytamar82	ea14909687	AVX2 SubPixel AVG Variance Optimization Optimizing 2 functions to process 32 elements in parallel instead of 16: 1. vp9_sub_pixel_avg_variance64x64 2. vp9_sub_pixel_avg_variance32x32 both of those function were calling vp9_sub_pixel_avg_variance16xh_ssse3 instead of calling that function, it calls vp9_sub_pixel_avg_variance32xh_avx2 that is written in avx2 and process 32 elements in parallel. This Optimization gave 80% function level gain and 2% user level gain Change-Id: Iea694654e1b7612dc6ed11e2626208c2179502c8	2014-02-28 22:51:04 -07:00
James Zern	d12b39daab	vp9_subpel_variance_impl_intrin_avx2.c: make some tables static + fix formatting Change-Id: I7b4ec11b7b46d8926750e0b69f7a606f3ab80895	2014-02-18 20:42:49 -08:00
levytamar82	52dac5d1cb	AVX2 SubPixel Variance Optimization Optimizing 2 functions to process 32 elements in parallel instead of 16: 1. vp9_sub_pixel_variance64x64 2. vp9_sub_pixel_variance32x32 both of those function were calling vp9_sub_pixel_variance16xh_ssse3 instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2 that is written in avx2 and process 32 elements in parallel. This Optimization gave 70% function level gain and 2% user level gain Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde	2014-02-14 16:59:11 -07:00
Andrew Russell	549c31f8ae	minor spelling cleanup in comments Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06	2014-02-12 16:32:51 -08:00
Yunqing Wang	0d43bd77e5	Bug fix in ssse3 quantize function A bug was reported in Issue 702: "SIGILL (Illegal instruction) when transcoding with vp9 - using FFmpeg". It was reproduced and fixed. Change-Id: Ie32c149a89af02856084aeaf289e848a905c7700	2014-02-07 14:32:30 -08:00
Dmitry Kovalev	005fc6970b	Finally removing "short" from transform names. Change-Id: I5259b68dc1bcceb153e3ffe638a79a59a3019e9d	2014-02-06 11:54:15 -08:00
Dmitry Kovalev	ff41764920	Removing _1d suffix from transform names. It is enough to specify (e.g.) idct16, it is obviously different from idct16x16. Change-Id: I6b408a37a945de3162429380b59a775b03b95db0	2014-01-27 16:15:36 -08:00
James Zern	b453941caf	vp9/encoder: add extern "C" to headers Change-Id: I4f51ce859a97bf1b8fd2b37ac585b7c643232b69	2014-01-23 16:21:24 -08:00
levytamar82	357b65369f	AVX2 Variance Optimization Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32, vp9_variance64x64, vp9_variance32x16, vp9_variance64x32, vp9_mse16x16 by migrating to AVX2 some of the functions were optimized by processing 32 elements instead of 16. some of the functions were optimized by processing 2 loop strides of 16 elements in a single 256 bit register This optimization gives between 2.4% - 2.7% user level performance gain and 42% function level gain. Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d	2014-01-08 12:05:53 -07:00
James Zern	bd9a388a06	vp9: normalize include guards Change-Id: If4ddbdcfb3ab387cbca6910b42cf4df8111e6879	2013-12-16 19:40:49 -08:00

1 2 3 4

165 Commits