generic-library/vpx

Author	SHA1	Message	Date
Dmitry Kovalev	e6fadb5ba8	Merge "Cleaning up vp9_variance_mmx.c."	2014-06-10 17:27:12 -07:00
Jingning Han	540d910350	Fix potential overflow issue in SSSE3 forward 8x8 2D-DCT The SSSE3 implementation might find a potential overflow issue in its second 1-D transform, if all input residual pixels are close to 255. This commit fixes the issue and re-enables the unit test on the SSSE3 version. Change-Id: I0520478abdab7afd3ff2842516bec951111e9b3c	2014-06-03 14:21:47 -07:00
Yaowu Xu	d553cc10dc	Merge "Fixed a crash windows build"	2014-05-29 08:16:19 -07:00
Yaowu Xu	43414f3f7b	Fixed a crash windows build Change-Id: I58baa1da1f3bfc8a6da454399139fe6a7473ff10	2014-05-28 15:50:50 -07:00
Dmitry Kovalev	ac3d97f124	Cleaning up vp9_variance_mmx.c. Change-Id: I42d83f91e272c92daed604c233f74439fe6307c5	2014-05-28 12:03:55 -07:00
Dmitry Kovalev	a789bfec87	Cleaning up vp9_variance_sse2.c. Change-Id: I5ec336848f6489c31cf2b645026fa2025db07466	2014-05-27 13:53:19 -07:00
Dmitry Kovalev	72ab966d5e	Removing vp9_pragmas.h. Change-Id: I9120a87e27e73e496932d11716937e2fad246521	2014-05-22 13:46:31 -07:00
Deb Mukherjee	b59b324171	Merge "Renames x86_64 specific asm files"	2014-05-22 12:30:38 -07:00
Deb Mukherjee	e272273443	Renames x86_64 specific asm files Renames all x86_64 specific assembly files to consistently end in _x86_64.asm. This will be useful for build systems to handle these files differently. All new 64-bit specific assembly files should use the new naming convention. Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536	2014-05-21 13:55:56 -07:00
Jingning Han	d8b26caa71	Merge "Adjust the forward 16x16 DCT computation steps"	2014-05-21 09:16:04 -07:00
Deb Mukherjee	a185bc3350	Extends temporal filtering to work for 422 data This is needed for profiles 1 and 2. Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028	2014-05-20 15:19:40 -07:00
Jingning Han	7f547336b7	Adjust the forward 16x16 DCT computation steps This commit adjusts the forward 16x16 DCT computation steps to simplify the register level operations. It fixes the corresponding sse2 version accordingly. Change-Id: I72a9c25b8ca9442fc5e113f47cd701ae55aa7f08	2014-05-19 12:39:26 -07:00
Yunqing Wang	c661cf0dad	Merge "AVX2 To VP9 Block Error Optimization"	2014-05-15 11:29:29 -07:00
levytamar82	1fbab853c8	AVX2 To VP9 Block Error Optimization vp9_block_error_sse2 can only handle 16 bytes at a time but the function requires to handle a sequence of 32 bytes at a time so each 16 bytes is handled in a different register. With AVX2 optimization the 32 bytes can be handled in one register instead of two in the SSE2 The vp9_block_error was optimized by 85%. The user level was optimized by 1.2% Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd	2014-05-14 11:51:07 -07:00
Alex Converse	b5422fab46	Add an x86inc MMX fwht4x4. Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197	2014-05-08 12:01:27 -07:00
Dmitry Kovalev	68a600d82a	Merge "Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c."	2014-05-07 13:34:05 -07:00
Paul Wilkins	33b1c457ed	Revert "Add an MMX fwht4x4" Includes changes that are not compatible with VS windows builds. Amongst other things stdint.h is not supported in VS. This reverts commit 89fbf3de501b5d7fd90047192521eae3198705cd. Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd	2014-05-07 12:53:27 +01:00
Alex Converse	75d05d5ed4	Merge "Add an MMX fwht4x4"	2014-05-06 11:12:27 -07:00
Alex Converse	89fbf3de50	Add an MMX fwht4x4 7% faster encoding a desktop lossless at RT speed 4. Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64	2014-05-05 15:10:48 -07:00
Jingning Han	52ae97b6aa	SSSE3 implementation of full inverse 8x8 2D-DCT This commit enables SSSE3 version full inverse 8x8 2D-DCT and reconstruction. It makes the runtime of vp9_idct8x8_64_add down from 256 cycles (SSE2) to 246 cycles. Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3	2014-05-05 10:49:27 -07:00
Dmitry Kovalev	25a666ef39	Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c. Change-Id: I642a7d343677bf934e9a54cf4ad78e908620e39a	2014-05-01 16:45:49 -07:00
Dmitry Kovalev	e05b92c0aa	Merge "Removing half-variance asm functions which are not used."	2014-05-01 14:50:45 -07:00
Jingning Han	39761eb5d6	Merge "Enable SSSE3 implementation of 8x8 forward 2D-DCT"	2014-04-30 13:41:36 -07:00
Dmitry Kovalev	94f5491c46	Removing half-variance asm functions which are not used. Corresponding C functions were removed in I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3 Change-Id: I50a5575065a7a9e41904eb2161afd739def927db	2014-04-30 12:21:54 -07:00
Jingning Han	1eaa3a76dc	Enable SSSE3 implementation of 8x8 forward 2D-DCT Assembly implementation of ssse3 8x8 forward 2D-DCT. The current version is turned on only for x86_64. The average unit runtime goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster. This translates into about 1.5% speed-up for pedestrian_area 1080p at speed 2. Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4	2014-04-29 15:49:18 -07:00
Dmitry Kovalev	6e01079cc0	Removing unused vp9_variance_halfpixvar*() functions. Change-Id: I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3	2014-04-25 11:50:07 -07:00
Dmitry Kovalev	2fc3a18653	Removing unused vp9_mcomp_x86.h file. We don't use declarations from this file. The real declarations (differently named) are in vp9_rtcd_defs.pl, e.g. vp9_full_search_sad. Change-Id: I73cbf064305710ba20747233cfdbe67366f069a0	2014-04-14 11:32:58 -07:00
levytamar82	0fa8b668c1	AVX2 SAD Optimization: 2 functions were optimized for avx2 by using full 256 bit register In order to handle 32 elements in parallel instead of only 16 in parallel: 1. vp9_sad32x32x4d 2. vp9_sad64x64x4d The function level gain is 66% and the user level gain is ~1%. Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb	2014-03-21 13:53:32 -07:00
Yaowu Xu	5511968f21	Removed several unused functions. Change-Id: Ib9e27298c575afc02a98b593bc6ad60762064d9b	2014-03-17 14:09:29 -07:00
Andrew Russell	e337322e63	Merge "improved speed of 4x4 sse2 fdct."	2014-03-05 14:35:44 -08:00
Andrew Russell	a46f5459c3	improved speed of 4x4 sse2 fdct. * speed improvment of 30 percent achieved * multiplies and adds remain the same * non-arithmetic instructions minimized by hand, by: -expanding 2 pass loop -removing irrelivant "shuffles" -combining last two rounding steps * further improvments may be possible Change-Id: Idec2c3f52910c48e6a0e0f9aefed5cae31b0b8c0	2014-03-03 14:25:42 -08:00
levytamar82	ea14909687	AVX2 SubPixel AVG Variance Optimization Optimizing 2 functions to process 32 elements in parallel instead of 16: 1. vp9_sub_pixel_avg_variance64x64 2. vp9_sub_pixel_avg_variance32x32 both of those function were calling vp9_sub_pixel_avg_variance16xh_ssse3 instead of calling that function, it calls vp9_sub_pixel_avg_variance32xh_avx2 that is written in avx2 and process 32 elements in parallel. This Optimization gave 80% function level gain and 2% user level gain Change-Id: Iea694654e1b7612dc6ed11e2626208c2179502c8	2014-02-28 22:51:04 -07:00
James Zern	d12b39daab	vp9_subpel_variance_impl_intrin_avx2.c: make some tables static + fix formatting Change-Id: I7b4ec11b7b46d8926750e0b69f7a606f3ab80895	2014-02-18 20:42:49 -08:00
levytamar82	52dac5d1cb	AVX2 SubPixel Variance Optimization Optimizing 2 functions to process 32 elements in parallel instead of 16: 1. vp9_sub_pixel_variance64x64 2. vp9_sub_pixel_variance32x32 both of those function were calling vp9_sub_pixel_variance16xh_ssse3 instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2 that is written in avx2 and process 32 elements in parallel. This Optimization gave 70% function level gain and 2% user level gain Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde	2014-02-14 16:59:11 -07:00
Andrew Russell	549c31f8ae	minor spelling cleanup in comments Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06	2014-02-12 16:32:51 -08:00
Yunqing Wang	0d43bd77e5	Bug fix in ssse3 quantize function A bug was reported in Issue 702: "SIGILL (Illegal instruction) when transcoding with vp9 - using FFmpeg". It was reproduced and fixed. Change-Id: Ie32c149a89af02856084aeaf289e848a905c7700	2014-02-07 14:32:30 -08:00
Dmitry Kovalev	005fc6970b	Finally removing "short" from transform names. Change-Id: I5259b68dc1bcceb153e3ffe638a79a59a3019e9d	2014-02-06 11:54:15 -08:00
Dmitry Kovalev	ff41764920	Removing _1d suffix from transform names. It is enough to specify (e.g.) idct16, it is obviously different from idct16x16. Change-Id: I6b408a37a945de3162429380b59a775b03b95db0	2014-01-27 16:15:36 -08:00
James Zern	b453941caf	vp9/encoder: add extern "C" to headers Change-Id: I4f51ce859a97bf1b8fd2b37ac585b7c643232b69	2014-01-23 16:21:24 -08:00
levytamar82	357b65369f	AVX2 Variance Optimization Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32, vp9_variance64x64, vp9_variance32x16, vp9_variance64x32, vp9_mse16x16 by migrating to AVX2 some of the functions were optimized by processing 32 elements instead of 16. some of the functions were optimized by processing 2 loop strides of 16 elements in a single 256 bit register This optimization gives between 2.4% - 2.7% user level performance gain and 42% function level gain. Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d	2014-01-08 12:05:53 -07:00
James Zern	bd9a388a06	vp9: normalize include guards Change-Id: If4ddbdcfb3ab387cbca6910b42cf4df8111e6879	2013-12-16 19:40:49 -08:00
Yaowu Xu	e9c19617bf	Merge "vp9_short_fdct32x32_rd vp9_short_fdct32x32 optimized for AVX2"	2013-11-27 10:27:32 -08:00
levytamar82	8def766de2	vp9_short_fdct32x32_rd vp9_short_fdct32x32 optimized for AVX2 Change-Id: I6366e84490883b72362f762369d7e5bccb64f02f	2013-11-21 14:19:49 -08:00
Abo Talib Mahfoodh	ec2dbdd107	Improve vp9_fdct4x4_sse2 (x1.2) Modifications are done to reduce the total clock cycle. Speedup: 1.2 Tested with: park_joy_420_720p50.y4m Change-Id: Ia36b87e62e2f80a5fadaf5628729aedc80f38f3f	2013-11-21 15:04:35 -05:00
Jingning Han	fabc783695	Fix an overflow issue in SSE2 forward ADST The step that sums three input samples could potentially cause the intermediate result go beyond 16 bit limit, when operating as the second 1-D transform. This commit fixes the issue. Change-Id: Iaf512449ac2d25ddd8a806d760afab362c62a516	2013-11-13 15:15:59 -08:00
Yunqing Wang	d7289658fb	Remove TEXTREL from 32bit encoder This patch fixed the issue reported in "Issue 655: remove textrel's from 32-bit vp9 encoder". The set of vp9_subpel_variance functions that used x86inc.asm ABI didn't build correctly for 32bit PIC. The fix was carefully done under the situation that there was not enough registers. After the change, we got $ eu-findtextrel libvpx.so eu-findtextrel: no text relocations reported in 'libvpx.so' Change-Id: I1b176311dedaf48eaee0a1e777588043c97cea82	2013-11-07 13:39:40 -08:00
Dmitry Kovalev	600a3860a4	Making input pointer constant for all fdct/fht functions. Change-Id: I78f7012f967a777ddd39bae6671eb501df6bbfe8	2013-10-24 11:48:25 -07:00
Dmitry Kovalev	fd724f13b0	Renaming vp9_short_fdct4x4 and vp9_short_walsh4x4. For consistency with idct function names. Renames: vp9_short_fdct4x4 -> vp9_fdct4x4 vp9_short_walsh4x4 -> vp9_fwht4x4 Change-Id: Id15497cc1270acca626447d846f0ce9199770f58	2013-10-23 14:28:39 -07:00
Dmitry Kovalev	a018988ce8	Renaming vp9_short_fdct32x32 to vp9_fdct32x32. For consistency with idct function names. Change-Id: Ie77b7178e0894c57cd5cb9243c949eb9224ece18	2013-10-23 13:41:40 -07:00
Dmitry Kovalev	5bdd4d9ccf	Merge "Renaming vp9_short_fdct16x16 to vp9_fdct16x16."	2013-10-23 13:37:09 -07:00

... 2 3 4 5 6 ...

306 Commits