generic-library/vpx

Author	SHA1	Message	Date
Deb Mukherjee	0d3c3d3ce7	Adds high bitdepth convolve, interpred & scaling Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3	2014-09-18 07:26:17 -07:00
Deb Mukherjee	81a8138fc3	Adding high-bitdepth intra prediction functions Change-Id: I6f5cb101e2dc57c3d3f4d7e0ffb4ddbed027d111	2014-09-16 15:04:39 -07:00
Dmitry Kovalev	1100e262c5	Removing postproc mmx code. Removed functions: * vp9_post_proc_down_and_across_mmx * vp9_mbpost_proc_down_mmx * vp9_plane_add_noise_mmx They all have sse2 equivalent. Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff	2014-09-05 11:52:50 -07:00
Johann	7516abc7dc	Remove vp9_postproc_x86.h This configuration has moved to vp9_rtcd_defs.pl Change-Id: I71a31dbb8d79df226b60dd834324a5af69956c51	2014-08-05 15:46:13 -07:00
hkuang	337e8015c9	Move vp9_thread.* to common. Prepare for frame parallel decoding, the reference count buffers need to be protected by mutex. Move vp9_thread.* to common folder so that those buffers could use cross-platform mutex from vp9_thread.*. Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154	2014-07-07 14:52:19 -07:00
Jingning Han	59c3f446fe	Merge "Inverse 16x16 2D-DCT SSSE3 implementation"	2014-05-23 16:01:22 -07:00
Jingning Han	48b0891370	Inverse 16x16 2D-DCT SSSE3 implementation This commit enables the SSSE3 implementation of full inverse 16x16 2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles, about 7% speed-up. Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d	2014-05-23 15:09:35 -07:00
Dmitry Kovalev	72ab966d5e	Removing vp9_pragmas.h. Change-Id: I9120a87e27e73e496932d11716937e2fad246521	2014-05-22 13:46:31 -07:00
Deb Mukherjee	e272273443	Renames x86_64 specific asm files Renames all x86_64 specific assembly files to consistently end in _x86_64.asm. This will be useful for build systems to handle these files differently. All new 64-bit specific assembly files should use the new naming convention. Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536	2014-05-21 13:55:56 -07:00
Johann	ce23931a3f	Only build neon assembly for armv7 targets Allow selectively building just the intrinsics for armv8 Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477	2014-05-12 08:52:02 -07:00
Jingning Han	52ae97b6aa	SSSE3 implementation of full inverse 8x8 2D-DCT This commit enables SSSE3 version full inverse 8x8 2D-DCT and reconstruction. It makes the runtime of vp9_idct8x8_64_add down from 256 cycles (SSE2) to 246 cycles. Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3	2014-05-05 10:49:27 -07:00
Dmitry Kovalev	3f1ab25812	Removing vp9_onyx.h and moving its content to the encoder. Change-Id: I03451c88536bc498edddbe0cd9773ff79da085c2	2014-03-05 23:33:22 -08:00
James Zern	805078a1bf	build: convert rtcd.sh to perl significantly speeds up file generation. the goal of this change is to convert rtcd.sh to perl as directly as possible to allow for simple comparison. future changes can make it more perl-like. --- Linux [CREATE] vpx_scale_rtcd.h real 0m0.485s -> 0m0.022s [CREATE] vp8_rtcd.h real 0m4.619s -> 0m0.060s [CREATE] vp9_rtcd.h real 0m10.102s -> 0m0.087s Windows [CREATE] vpx_scale_rtcd.h real 0m8.360s -> 0m0.080s [CREATE] vp8_rtcd.h real 1m8.083s -> 0m0.160s [CREATE] vp9_rtcd.h real 2m6.489s -> 0m0.233s Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee	2014-03-03 14:47:11 -08:00
Dmitry Kovalev	2c594a5275	Removing vp9_systemdependent.c. Change-Id: I7b9738a7113c0c4687e5d320581ff69d98a8b271	2014-02-26 18:07:23 -08:00
levytamar82	3068d7d944	SSSE3 convolution optimization Optimizing all SSSE3 assembly for convolution: 1. vp9_filter_block1d4_h8_sse2 2. vp9_filter_block1d8_h8_sse2 3. vp9_filter_block1d16_h8_sse2 4. vp9_filter_block1d4_v8_sse2 5. vp9_filter_block1d8_v8_sse2 6. vp9_filter_block1d16_v8_sse2 my optimization include: -processing 2x8 elements in one 128 bit register instead of processing 8 elements in one 128 bit register. -removing unecessary loads. This optimization gives between 2.4% user level gain for 480p input and 1.6% user level gain for 720p. This Optimization is done only for 64 bit Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b	2014-02-14 15:08:42 -07:00
levytamar82	876c72a093	AVX2 Convolve Optimization Two convolve functions were optimized for AVX2: 1. vp9_filter_block1d16_h8 2. vp9_filter_block1d16_v8 vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of loop strides by half, two strides were processed in parallel. vp9_filter_block1d16_v8 was also optimized in the same way also some of the loads were being done outside of the loop and by that preventing redundant loads. This Optimization gives 43% function level gain and 1.3% user level gain. Now can be compiled in Windows Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c	2014-02-12 20:45:31 -07:00
Frank Galligan	d51ca0db00	Merge "Add get release decoder frame buffer functions."	2014-02-11 08:19:37 -08:00
James Zern	66bfc69bfc	Merge "*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/"	2014-02-10 15:39:28 -08:00
Frank Galligan	e8e152799b	Add get release decoder frame buffer functions. This CL changes libvpx to call a function when a frame buffer is needed for decode. Libvpx will call a release callback when no other frames reference the frame buffer. This CL adds a default implementation of the frame buffer callbacks. Currently only VP9 is supported. A future CL will add support for applications to supply their own frame buffer callbacks. Change-Id: I1405a320118f1cdd95f80c670d52b085a62cb10d	2014-02-10 14:08:11 -08:00
James Zern	7cf0c783c1	*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/ CONFIG_USE_X86INC is available to every makefile, there's no need to duplicate its value with USE_X86INC Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6	2014-02-04 20:04:38 -08:00
Yunqing Wang	d1961e6fbf	Optimize bilinear sub-pixel filters in ssse3 This patch added ssse3 optimization of bilinear sub-pixel filters. The real time encoder was speeded up by ~1%. Change-Id: Ie82e98976f411183cb8c61ab8d2ba0276e55a338	2014-02-04 08:01:55 -08:00
Dmitry Kovalev	282f36adc4	Merge "Removing "_short" suffix from arm transform file names."	2014-02-03 14:28:47 -08:00
Yunqing Wang	2488cb34bc	Optimize bilinear sub-pixel filters in sse2 Using bilinear filters could speed up the codec in real-time mode. This patch added sse2 optimizations of bilinear filters that operate on different-sized blocks. Tests showed that the real-time encoder was speeded up by 3%. Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f	2014-02-03 10:34:45 -08:00
Jim Bankoski	9dec7712ab	static function convert to inline or global vp9_blockd.h Change-Id: Ifdd951f24932839f06d1c700371662511dde6ebe	2014-01-31 19:50:40 -08:00
Dmitry Kovalev	c49b08c9a1	Removing "_short" suffix from arm transform file names. Change-Id: Iefe118f61a335e88821a21a9f50fb919212c1507	2014-01-31 17:19:02 -08:00
Yunqing Wang	d2bb0c51d3	Revert "Revert "Revert "SSSE3 convolution optimization""" This reverts commit `f9404f2406`. This patch caused some ASAN error. Change-Id: If15b7e581310e19061d111c69f2931809662ed19	2014-01-16 16:11:46 -08:00
Yunqing Wang	f9404f2406	Revert "Revert "SSSE3 convolution optimization"" This reverts commit `b645257121`. Change-Id: I60d1bf57ae8e9eb6127f42f2d5a780124ac51b45	2014-01-13 12:29:55 -08:00
Paul Wilkins	b645257121	Revert "SSSE3 convolution optimization" This reverts commit `511d218c60`. In current form intrinsics break borg build. Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9	2014-01-10 13:38:26 +00:00
Yunqing Wang	f3b9b97c0e	Merge "SSSE3 convolution optimization"	2014-01-09 12:39:47 -08:00
levytamar82	511d218c60	SSSE3 convolution optimization Optimizing all SSSE3 assembly for convolution: 1. vp9_filter_block1d4_h8_sse2 2. vp9_filter_block1d8_h8_sse2 3. vp9_filter_block1d16_h8_sse2 4. vp9_filter_block1d4_v8_sse2 5. vp9_filter_block1d8_v8_sse2 6. vp9_filter_block1d16_v8_sse2 my optimization include: -processing 2x8 elements in one 128 bit register instead of processing 8 elements in one 128 bit register. -removing unecessary loads. This optimization gives between 2.4% user level gain for 480p input and 1.6% user level gain for 720p. This Optimization done only for 64bit. Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969	2014-01-09 12:27:51 -07:00
hkuang	5be0ed30dc	Merge "Add initial intra frame neon optimization. 1~2% gain."	2014-01-08 14:41:43 -08:00
hkuang	691111aacf	Add initial intra frame neon optimization. 1~2% gain. More intra optimizations will be added. Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8	2014-01-08 11:58:42 -08:00
Dmitry Kovalev	987810ad95	Removing vp9_findnearmv.{h, c} files. Moving all code from that files to vp9_mvref_common.{h, c}. Change-Id: Ibc4afcb8cea6847166ff411130e93611ebe63b20	2013-12-19 17:39:57 -08:00
Dmitry Kovalev	b5c9261832	Converting vp9_treecoder.h to vp9_prob.{h, c} Moving vp9_norm probability table from vp9_entropy.c to vp9_prob.c Change-Id: Ie757b73860c6f43130790c332b292e2a1a81b788	2013-12-16 12:53:09 -08:00
Dmitry Kovalev	4ac6a2552b	Moving vp9_tree_probs_from_distribution() to encoder. Writing custom coeff branch count calculation (which is much clearer) in adapt_coef_probs() function. Removing vp9_treecoder.c file. Change-Id: I8880fb7a39996c8bcf6cd0acf9898a8c712ba91f	2013-12-05 18:13:26 -08:00
Dmitry Kovalev	4afd141a05	Removing vp9_default_coef_probs.h file. Moving all probability tables from removed file to vp9_entropy.c. Change-Id: I12846f1da778c3016d96b82e53384d4634883430	2013-12-04 17:04:35 -08:00
Frank Galligan	b4874e2c82	Fix 16 wide neon horz loopfilter. Multiply by 3 was on 8bit vectors when it should have been on 16bit vectors. Change-Id: I248c1429b3134dfd171dfab0ebb109fd2437e1fc	2013-11-26 10:02:40 -08:00
Frank Galligan	97d1258375	Revert "Add 16 wide neon horz loopfilter." The change caused mismatches with some test vectors on neon. Original CL: https://gerrit.chromium.org/gerrit/#/c/67863/ Change-Id: I913891636d53783e93cb1865ca78ded1821dc4b0	2013-11-21 14:01:33 -08:00
Frank Galligan	2dd77580c0	Merge "Add 16 wide neon horz loopfilter."	2013-11-21 10:29:30 -08:00
Frank Galligan	98de15137e	Add 16 wide neon horz loopfilter. Add support to do 16 pixel horizontal filtering in Neon. Nexus devices saw about 0.5% decode speed increase. Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d	2013-11-21 09:39:36 -08:00
Yaowu Xu	30b03050a2	Move vp9_sadmxn.h from common to encoder Change-Id: I6f6ba91b1b8b280902b171472314d665aa0baf0b	2013-11-19 12:46:08 -08:00
Yaowu Xu	a42ab027fd	Merge "Move vp9_extend.{h,c} from common to encoder"	2013-11-18 15:43:32 -08:00
Yaowu Xu	1c61e1960d	Move vp9_extend.{h,c} from common to encoder Since they used in encoder only. This commit also re-order includes for the files that include vp9_extend.h Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459	2013-11-18 12:43:36 -08:00
Yunqing Wang	64f728caef	Do horizontal loopfiltering in parallel This patch followed "Rewrite filter_selectively_horiz for parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. Also, corrected the declaration of aligned arrays. For 8-pixel-in-parallel case, improved the calculation of the masks and filters. Updated the threshold loading since the thresholds were already duplicated. Updated neon C functions to call neon loopfilters twice. Using tulip clip, tests showed it gave a ~1.5% decoder speed gain. Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35	2013-11-15 16:18:43 -08:00
Johann	4da2a8b718	Merge "mips dsp-ase r2 vp9 decoder intra module optimizations (rebase)"	2013-11-13 09:00:09 -08:00
Parag Salasakar	1530a6b77f	mips dsp-ase r2 vp9 decoder intra module optimizations (rebase) Change-Id: Ib27fc4f3dbe01fe8adfa04a61aaba21b3480e75c	2013-11-13 11:17:14 +05:30
Parag Salasakar	248cf6f69f	mips dsp-ase r2 vp9 decoder loopfilter module optimizations (rebase) Change-Id: Ia7f640ca395e8deaac5986f19d11ab18d85eec2d	2013-11-13 10:53:16 +05:30
hkuang	a6462990e6	Merge "Add back vp9_short_idct32x32_1_add_neon which is deleted in cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3."	2013-11-07 14:42:29 -08:00
hkuang	6b16f63332	Add back vp9_short_idct32x32_1_add_neon which is deleted in cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3. Change-Id: I034848cf05031618818f7df2e7f9c35102686948	2013-11-05 14:57:32 -08:00
Tamar Levy	54f9205653	mb_lpf_horizontal_edge AVX2 optimization This CL contains two AVX2 optimized loop filter functions, mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16. Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b	2013-10-31 10:26:15 -06:00

1 2 3 4

173 Commits