generic-library/vpx

Author	SHA1	Message	Date
Yunqing Wang	f9404f2406	Revert "Revert "SSSE3 convolution optimization"" This reverts commit `b645257121`. Change-Id: I60d1bf57ae8e9eb6127f42f2d5a780124ac51b45	2014-01-13 12:29:55 -08:00
James Zern	f83c12b540	Merge "cosmetics: vp9_reconinter.h: make some variables const"	2014-01-11 12:39:32 -08:00
Dmitry Kovalev	96be0a50ab	Removing mi_height_log2_lookup table. Change-Id: I1f0ae2edc3a96b33c0494d165ae756a8feba6184	2014-01-10 13:29:47 -08:00
Paul Wilkins	b645257121	Revert "SSSE3 convolution optimization" This reverts commit `511d218c60`. In current form intrinsics break borg build. Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9	2014-01-10 13:38:26 +00:00
Jingning Han	a4c94a94cc	Merge "Optimze inv 16x16 DCT with 10 non-zero coeffs - P2"	2014-01-09 18:17:25 -08:00
Jingning Han	faa2ba86cc	Merge "Optimze inv 16x16 DCT with 10 non-zero coeffs - P1"	2014-01-09 18:17:12 -08:00
Dmitry Kovalev	c8e8d3a461	Merge "Renaming 'Sharpness' to 'sharpness'."	2014-01-09 13:42:55 -08:00
Jingning Han	af31b27aae	Optimze inv 16x16 DCT with 10 non-zero coeffs - P2 This commit further optimizes SSE2 operations in the second 1-D inverse 16x16 DCT, with (<10) non-zero coefficients. The average runtime of this module goes down from 779 cycles -> 725 cycles. Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f	2014-01-09 12:46:09 -08:00
Yunqing Wang	f3b9b97c0e	Merge "SSSE3 convolution optimization"	2014-01-09 12:39:47 -08:00
levytamar82	511d218c60	SSSE3 convolution optimization Optimizing all SSSE3 assembly for convolution: 1. vp9_filter_block1d4_h8_sse2 2. vp9_filter_block1d8_h8_sse2 3. vp9_filter_block1d16_h8_sse2 4. vp9_filter_block1d4_v8_sse2 5. vp9_filter_block1d8_v8_sse2 6. vp9_filter_block1d16_v8_sse2 my optimization include: -processing 2x8 elements in one 128 bit register instead of processing 8 elements in one 128 bit register. -removing unecessary loads. This optimization gives between 2.4% user level gain for 480p input and 1.6% user level gain for 720p. This Optimization done only for 64bit. Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969	2014-01-09 12:27:51 -07:00
Dmitry Kovalev	4fbe54d201	Merge "Renaming 'Mode' to 'mode'."	2014-01-08 16:29:29 -08:00
Jingning Han	ba6ab46cdc	Optimze inv 16x16 DCT with 10 non-zero coeffs - P1 This commit is the first patch optimizing SSE2 implementation of inverse 16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row) transformation. It exploits the fact that only top-left 4x4 block contains non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients. The average runtime of idct16x16_10 unit is reduced from 883 cycles -> 779 cycles (12% faster). For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes down from 310651 ms -> 305910 ms. The decoding speed goes up from 80.37 fps -> 80.87 fps. Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645	2014-01-08 15:36:45 -08:00
Alex Converse	8fcb74e6bb	Merge "Add a C fallback for get_msb() and change inline to INLINE."	2014-01-08 14:43:46 -08:00
hkuang	5be0ed30dc	Merge "Add initial intra frame neon optimization. 1~2% gain."	2014-01-08 14:41:43 -08:00
Dmitry Kovalev	962c8b241e	Renaming 'Mode' to 'mode'. Change-Id: I6cdd670d66288dbd66228f38bba6b30502d25362	2014-01-08 14:33:59 -08:00
Dmitry Kovalev	57be81369a	Renaming 'Sharpness' to 'sharpness'. Change-Id: I54513dc3b3321e0c0bb6b15ea5c34085ed80b4a4	2014-01-08 14:19:14 -08:00
Alex Converse	ce7ff3b63d	Add a C fallback for get_msb() and change inline to INLINE. For systems without __builtin_clz() or _BitScanReverse(), taken from libwep Change-Id: Iead257efc1772c466c79e1dc0356ed571d38d43e	2014-01-08 12:25:47 -08:00
hkuang	691111aacf	Add initial intra frame neon optimization. 1~2% gain. More intra optimizations will be added. Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8	2014-01-08 11:58:42 -08:00
Yunqing Wang	a84029ad9c	Merge "AVX2 Variance Optimization"	2014-01-08 11:33:42 -08:00
levytamar82	357b65369f	AVX2 Variance Optimization Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32, vp9_variance64x64, vp9_variance32x16, vp9_variance64x32, vp9_mse16x16 by migrating to AVX2 some of the functions were optimized by processing 32 elements instead of 16. some of the functions were optimized by processing 2 loop strides of 16 elements in a single 256 bit register This optimization gives between 2.4% - 2.7% user level performance gain and 42% function level gain. Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d	2014-01-08 12:05:53 -07:00
Alex Converse	f2ca665f1c	Replace RD modeling with a fixed point approximation. Change-Id: I44eb44eb3f36c05d916ef140ef42cc84f72f99ec	2014-01-08 10:37:24 -08:00
Dmitry Kovalev	bbb25e6a39	Merge "Adding RefBuffer struct."	2014-01-06 14:19:44 -08:00
Jingning Han	b49e9fb433	Merge "Tune IDCT8_1D macro function interface"	2014-01-06 09:38:19 -08:00
Dmitry Kovalev	0c5575fe57	Merge "Moving hev mask calculation into filter4() function."	2014-01-03 15:56:16 -08:00
Jingning Han	3e0c62b53f	Tune IDCT8_1D macro function interface This commit adds input/output ports for IDCT8_1D macro function to provide more flexibility in variable use. It allows to skip several buffer swap operations. Change-Id: I21f3450509537322293043b3281bfd3949868677	2014-01-03 15:23:47 -08:00
Dmitry Kovalev	ba41e9d459	Adding RefBuffer struct. Adding RefBuffer to simplify reference buffer management. The struct has a pointer to image data and scale factors relative to the current frame. Change-Id: If38eb1491ff687cc11428aee339f3e052e2c5d9e	2014-01-03 15:21:55 -08:00
Jingning Han	0b1a27135a	Reduce num of buffer swap calls in idct8_1d_sse2 This commit merges the initial buffer swap operations in idct8_1d_sse2 into the array transpose step, hence reducing number of instructions therein. Change-Id: I219f6f50813390d2ec3ee37eecf2a4a2b44ae479	2014-01-03 12:12:03 -08:00
Jingning Han	1bb11781e2	Rework idct8x8_10 SSE2 implementation This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits the fact that only top-left 4x4 block contains non-zero coefficients, and hence reduces the instructions needed. The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles, estimated by averaging over 100000 runs. For pedestrian_area_1080p 300 frames coded at 4000kbps, the average decoding speed goes up from 79.3 fps to 79.7 fps. Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180	2014-01-03 12:04:09 -08:00
Yaowu Xu	8458c8c450	Merge "Fix show existing frame"	2014-01-02 09:27:28 -08:00
Dmitry Kovalev	f3beca079c	Merge "Calculating has_second_ref only once for single_ref context."	2013-12-26 13:41:02 -08:00
Dmitry Kovalev	1e8b5bf4ac	Merge "Removing vp9_findnearmv.{h, c} files."	2013-12-26 13:38:38 -08:00
James Zern	44963dfd37	cosmetics: vp9_reconinter.h: make some variables const Change-Id: If5cd0a1487e97c8e9d13dc2e078c6dceaf79de4f	2013-12-26 14:02:46 -05:00
Dmitry Kovalev	87440aeb82	Moving MAX_PROB constant to vp9_prob.h. Change-Id: I07470ad1b7a0344d088911428ffab8ba9a0d8708	2013-12-20 15:56:59 -08:00
Dmitry Kovalev	b3b9f4a4d0	Merge "Using single struct to represent scale factors."	2013-12-20 11:22:02 -08:00
Yunqing Wang	b6a0ac11f0	Merge "Code clean up"	2013-12-20 08:46:11 -08:00
Dmitry Kovalev	987810ad95	Removing vp9_findnearmv.{h, c} files. Moving all code from that files to vp9_mvref_common.{h, c}. Change-Id: Ibc4afcb8cea6847166ff411130e93611ebe63b20	2013-12-19 17:39:57 -08:00
Dmitry Kovalev	a3fbcc88bb	Using single struct to represent scale factors. Moving back to scale_factors struct. We don't need anymore x_offset_q4 and y_offset_q4 because both values are calculated locally inside vp9_scale_mv function. Change-Id: I78a2122ba253c428a14558bda0e78ece738d2b5b	2013-12-19 16:06:33 -08:00
Dmitry Kovalev	c872d2be65	Call set_scaled_offsets() just before scale_mv() call. Before mv scaling it is required to calculate x_offset_q4/y_offset_q4 by calling set_scaled_offsets(). Now offset configuration can not be missed because it happens just before scale_mv(). Change-Id: I7dd1a85b85811a6cc67c46c9b01e6ccbbb06ce3a	2013-12-19 14:55:13 -08:00
Yunqing Wang	09faf55916	Code clean up Removed unused filter coefficients. Change-Id: Ib395a51305e23ff41ab69c1808d56946d25961cd	2013-12-19 11:09:23 -08:00
Dmitry Kovalev	c67ee5ea24	Merge "Converting vp9_treecoder.h to vp9_prob.{h, c}"	2013-12-19 11:03:30 -08:00
Marco Paniconi	02d5ebcfdc	Merge "Updates for 1-pass CBR rate control."	2013-12-18 10:28:33 -08:00
Marco Paniconi	1b8b8b0d0d	Updates for 1-pass CBR rate control. Adjustments based on buffer level, frame dropper. Change-Id: Iaa85b570493526a60c4b9fb7ded4c0226b1b3a33	2013-12-18 09:24:24 -08:00
Jim Bankoski	9d754dcca8	Merge "rename loop filter functions"	2013-12-17 18:56:09 -08:00
Jim Bankoski	b720ba165f	rename loop filter functions This renames all the loop filter functions so that they no longer refer to mb Change-Id: I8a58a8c7fd253d835cb619bde13913e896ece90b	2013-12-17 17:34:34 -08:00
Dmitry Kovalev	118c8fb3fb	Calculating has_second_ref only once for single_ref context. Change-Id: Ib1253e0606426850f53060a4c5303af86bf1c093	2013-12-17 17:02:24 -08:00
Dmitry Kovalev	c6a1ff223b	Merge "Calling is_inter_block() only if mbmi is available."	2013-12-17 16:10:56 -08:00
Dmitry Kovalev	4821084b3f	Moving hev mask calculation into filter4() function. Change-Id: Ieccf2070b2b01b4135f4c5f9857667eb7825c761	2013-12-17 15:23:23 -08:00
Dmitry Kovalev	eb0c73b6e0	Merge "Converting mode_lf_lut struct member into static lookup table."	2013-12-17 15:20:05 -08:00
James Zern	bd9a388a06	vp9: normalize include guards Change-Id: If4ddbdcfb3ab387cbca6910b42cf4df8111e6879	2013-12-16 19:40:49 -08:00
Yaowu Xu	3cce464342	Define POSITION to differentiate from MV MV struct was ussed to indicate the postition of a MI_BLOCK with row and col components. The expression was confusing, this commit added a new stucture "POSITION" with row and col component to better describe the position of a mi_block. Change-Id: I59fdd4b45010fe7d85a8db22a55503265c4f5b2b	2013-12-16 17:28:00 -08:00

1 2 3 4 5 ...

2017 Commits