generic-library/vpx

Author	SHA1	Message	Date
Christian Duvivier	c129203f7e	Faster vp9_short_fdct8x8. Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d	2013-02-27 17:23:08 -08:00
John Koleszar	5ac141187a	Merge "Remove unused vp9_copy32xn" into experimental	2013-02-27 12:23:45 -08:00
John Koleszar	7ad8dbe417	Remove unused vp9_copy32xn This function was part of an optimization used in VP8 that required caching two macroblocks. This is unused in VP9, and might not survive refactoring to support superblocks, so removing it for now. Change-Id: I744e585206ccc1ef9a402665c33863fc9fb46f0d	2013-02-27 10:24:56 -08:00
Yunqing Wang	35bc02c6eb	Optimize vp9_dc_only_idct_add_c function Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to improve performance, clipped the absolute diff values to [0, 255]. This allowed us to keep the additions/subtractions in 8 bits. Test showed an over 2% decoder performance increase. Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d	2013-02-26 17:16:13 -08:00
Jingning Han	77a3becf92	clean up forward and inverse hybrid transform Rebased. Remove the old matrix multiplication transform computation. The 16x16 ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16 300/0 in vp9/common/vp9_blockd.h. Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f	2013-02-25 09:16:12 -08:00
James Zern	e5fb6321a1	give vp9 variance struct a unique name variance_vtable clashed with vp8/common/variance.h Change-Id: I09c1de44d5519f1bd13f58c01144c0de4706de6f	2013-02-22 16:25:13 -08:00
Jingning Han	babbd5d170	Forward butterfly hybrid transform This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT hybrid transform. The kernel of 4x4 ADST is sin((2k+1)(n+1)/(2N+1)). The kernel of 8x8/16x16 ADST is of the form sin((2k+1)(2n+1)/4N). Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1	2013-02-21 18:24:28 -08:00
Ronald S. Bultje	35524e2231	Remove "eobs" array in MACROBLOCKD. The information is a duplicate of "eob" in BLOCKD. Change-Id: Ia6416273bd004611da801e4bfa6e2d328d6f02a3	2013-02-21 10:07:36 -08:00
Yaowu Xu	d262e26cc7	Merge lossless experiment Change-Id: I7b7b8d4fda3a23699e0c920d727f8c15d37d43aa	2013-02-20 07:54:28 -08:00
Jingning Han	cd907b1601	16x16 butterfly inverse ADST/DCT hybrid transform rebased. This patch includes 16x16 butterfly inverse ADST/DCT hybrid transform. It uses the variant ADST of kernel sin((2k+1)*(2n+1)/4N), which allows a butterfly implementation. The coding gains as compared to DCT 16x16 are about 0.1% for both derf and std-hd. It is noteworthy that for std-hd sets many sequences gains about 0.5%, some 0.2%. There are also few points that provides -1% to -3% performance. Hence the average goes to about 0.1%. Change-Id: Ie80ac84cf403390f6e5d282caa58723739e5ec17	2013-02-19 09:07:00 -08:00
Ronald S. Bultje	46dff5d233	Remove some Y2-related code. Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78	2013-02-15 14:06:25 -08:00
Scott LaVarnway	7755657ea7	Merge "WIP: ssse3 version of convolve avg functions" into experimental	2013-02-15 07:54:21 -08:00
Yaowu Xu	d3de97794f	Merge "fix the lossless experiment" into experimental	2013-02-13 09:54:35 -08:00
Yaowu Xu	16f25f9dc8	fix the lossless experiment Change-Id: I95acfc1417634b52d344586ab97f0abaa9a4b256	2013-02-13 09:20:26 -08:00
Scott LaVarnway	30f866f44b	WIP: ssse3 version of convolve avg functions Initial ssse3 convolve avg functions and is one step closer to using x86inc.asm. The decoder performance improved by 8% for the test clip used. This should be revisited later to see if averaging outside the loop is better than having many similar filter functions. Change-Id: Ice3fafb423b02710b0448ffca18b296bcac649e9	2013-02-13 09:15:38 -08:00
Jingning Han	57e995ff9c	butterfly inverse 4x4 ADST fixed format issues. Implement the inverse 4x4 ADST using 9 multiplications. For this particular dimension, the original ADST transform can be factorized into simpler operations, hence is retained. Change-Id: Ie5d9749942468df299ab74e90d92cd899569e960	2013-02-11 10:42:39 -08:00
Ronald S. Bultje	c0ce2ab349	Port sadNxNx4d functions to x86inc.asm. Change-Id: Ic639f5742f7a007753d7a3fa5c66235172eb31d8	2013-02-08 17:59:32 -08:00
Ronald S. Bultje	02ff360b33	Add sad64x64 and sad32x32 SSE2 versions. Also port the 4x4, 16x16, 8x16 and 16x8 versions to x86inc.asm; this makes them all slightly faster, particularly on x86-64. Remove SSE3 sad16x16 version, since the SSE2 version is now faster. About 1.5% overall encoding speedup. Change-Id: Id4011a78cce7839f554b301d0800d5ca021af797	2013-02-08 16:32:25 -08:00
John Koleszar	3de8ee6ba1	Merge changes Ife0d8147,I7d469716,Ic9a5615f into experimental * changes: Restore SSSE3 subpixel filters in new convolve framework Convert subpixel filters to use convolve framework Add 8-tap generic convolver	2013-02-08 13:19:47 -08:00
John Koleszar	29d47ac80e	Restore SSSE3 subpixel filters in new convolve framework This commit adds the 8 tap SSSE3 subpixel filters back into the code underneath the convolve API. The C code is still called for 4x4 blocks, as well as compound prediction modes. This restores the encode performance to be within about 8% of the baseline. Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c	2013-02-08 12:18:14 -08:00
Jingning Han	d15e1da494	Butterfly ADST based hybrid transform Refactor the 8x8 inverse hybrid transform. It is now consistent with the new inverse DCT. Overall performance loss (due to the use of this variant ADST, and the rounding errors in the butterfly implementation) for std-hd is -0.02. Fixed BUILD warning. Devise a variant of the original ADST, which allows butterfly computation structure. This new transform has kernel of the form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures using floating-point multiplications was reported in Z. Wang, "Fast algorithms for the discrete W transform and for the discrete Fourier transform", IEEE Trans. on ASSP, 1984. This patch includes the butterfly implementation of the inverse ADST/DCT hybrid transform of dimension 8x8. Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d	2013-02-07 10:07:46 -08:00
Ronald S. Bultje	a788e0fe63	Add sse2 versions of sub_pixel_variance{32x32,64x64}. 7.5% faster overall encoding. Change-Id: Ie9bb7f9fdf93659eda106404cb342525df1ba02f	2013-02-06 11:20:59 -08:00
Ronald S. Bultje	1407bdc243	[WIP] Add column-based tiling. This patch adds column-based tiling. The idea is to make each tile independently decodable (after reading the common frame header) and also independendly encodable (minus within-frame cost adjustments in the RD loop) to speed-up hardware & software en/decoders if they used multi-threading. Column-based tiling has the added advantage (over other tiling methods) that it minimizes realtime use-case latency, since all threads can start encoding data as soon as the first SB-row worth of data is available to the encoder. There is some test code that does random tile ordering in the decoder, to confirm that each tile is indeed independently decodable from other tiles in the same frame. At tile edges, all contexts assume default values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode), and motion vector search and ordering do not cross tiles in the same frame. t log Tile independence is not maintained between frames ATM, i.e. tile 0 of frame 1 is free to use motion vectors that point into any tile of frame 0. We support 1 (i.e. no tiling), 2 or 4 column-tiles. The loopfilter crosses tile boundaries. I discussed this briefly with Aki and he says that's OK. An in-loop loopfilter would need to do some sync between tile threads, but that shouldn't be a big issue. Resuls: with tiling disabled, we go up slightly because of improved edge use in the intra4x4 prediction. With 2 tiles, we lose about ~1% on derf, ~0.35% on HD and ~0.55% on STD/HD. With 4 tiles, we lose another ~1.5% on derf ~0.77% on HD and ~0.85% on STD/HD. Most of this loss is concentrated in the low-bitrate end of clips, and most of it is because of the loss of edges at tile boundaries and the resulting loss of intra predictors. TODO: - more tiles (perhaps allow row-based tiling also, and max. 8 tiles)? - maybe optionally (for EC purposes), motion vectors themselves should not cross tile edges, or we should emulate such borders as if they were off-frame, to limit error propagation to within one tile only. This doesn't have to be the default behaviour but could be an optional bitstream flag. Change-Id: I5951c3a0742a767b20bc9fb5af685d9892c2c96f	2013-02-05 15:43:03 -08:00
Ronald S. Bultje	822864131b	Merge "Add SSE3 versions for sad{32x32,64x64}x4d functions." into experimental	2013-02-05 15:40:46 -08:00
Ronald S. Bultje	58c983d109	Add SSE3 versions for sad{32x32,64x64}x4d functions. Overall encoding about 15% faster. Change-Id: I176a775c704317509e32eee83739721804120ff2	2013-02-05 15:21:47 -08:00
John Koleszar	7a07eea13f	Convert subpixel filters to use convolve framework Update the code to call the new convolution functions to do subpixel prediction rather than the existing functions. Remove the old C and assembly code, since it is unused. This causes a 50% performance reduction on the decoder, but that will be resolved when the asm for the new functions is available. There is no consensus for whether 6-tap or 2-tap predictors will be supported in the final codec, so these filters are implemented in terms of the 8-tap code, so that quality testing of these modes can continue. Implementing the lower complexity algorithms is a simple exercise, should it be necessary. This code produces slightly better results in the EIGHTTAP_SMOOTH case, since the filter is now applied in only one direction when the subpel motion is only in one direction. Like the previous code, the filtering is skipped entirely on full-pel MVs. This combination seems to give the best quality gains, but this may be indicative of a bug in the encoder's filter selection, since the encoder could achieve the result of skipping the filtering on full-pel by selecting one of the other filters. This should be revisited. Quality gains on derf positive on almost all clips. The only clip that seemed to be hurt at all datarates was football (-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR, 0.347% SSIM. Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff	2013-02-05 14:23:17 -08:00
John Koleszar	5ca6a3667f	Add 8-tap generic convolver This commit introduces a new convolution function which will be used to replace the existing subpixel interpolation functions. It is much the same as the existing functions, but allows for changing the filter kernel on a per-pixel basis, and doesn't bake in knowledge of the filter to be applied or the size of the resulting block into the function name. Replacing the existing subpel filters will come in a later commit. Change-Id: Ic9a5615f2f456cb77f96741856fc650d6d78bb91	2013-02-05 14:19:28 -08:00
Scott LaVarnway	5780c4cbd5	Added vp9_short_idct1_32x32_c and called this function in vp9_dequant_idct_add_32x32_c when eob == 1. For the test clip used, the decoder performance improved by 21+%. Based on Yaowu's 16 point idct work. Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43	2013-02-04 16:49:17 -08:00
Yaowu Xu	1eb79dc1dc	re-write 8 point idct to be consistent with idct16 and idct32. Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204	2013-02-04 07:31:25 -08:00
Yaowu Xu	ccaaeb4b5a	a couple of minor fixes fixed a function prototypes to prevent compiler warnings; removed a function not in use; un-capitialize "Refstride" to ref_stride Change-Id: Ib4472b6084f357d96328c6a06e795b6813a9edba	2013-02-04 07:19:32 -08:00
Yaowu Xu	91e0e80142	Changes 16 point idct This commit changes the inverse 16 point dct to use the same algorithm as the one for 32 point idct. In fact, now 16 point dct uses the exact version of the souce code for even portion of the 32 point idct. Tests showed current implementation has significant better accuracy than the previous version. With this implementation and the minor bug fix on forward 16 point dct, encoding tests showed about 0.2% better compression of CIF set, test results on std-hd setting pending. Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63	2013-01-31 19:52:18 -08:00
Ronald S. Bultje	c9071601a2	Remove compound intra-intra experiment. This experiment gives little gains and adds relatively much code complexity (and it hinders other experiments), so let's get rid of it. Change-Id: Id25e79a137a1b8a01138aa27a1fa0ba4a2df274a	2013-01-14 15:47:25 -08:00
Yaowu Xu	741fbe9656	Merge experiment "subpelrefmv" Change-Id: Iac7f3d108863552b850c92c727e00c95571c9e96	2013-01-14 15:18:47 -08:00
Yaowu Xu	f7dab60096	Merge experiment "widerlpf" Change-Id: I0c94475075e66e13cfe4c20fab7db6474441ae86	2013-01-14 15:17:35 -08:00
Scott LaVarnway	4987c0f07e	Initial sse2 version of the wide loopfilters Updated the rtcd_defs and used the sse2 uv version of the loopfilter. The performance improved by ~8% for the test clip used. Change-Id: I5a0bca3b6674198d40ca4a77b8cc722ddde79c36	2013-01-11 14:54:14 -08:00
Jim Bankoski	9431536045	rtcd for new wider loop filters Change-Id: I8826bcdcf72ba6d86bde31cd13902a710399805c	2013-01-11 09:45:45 -08:00
Ronald S. Bultje	aa2effa954	Merge tx32x32 experiment. Change-Id: I615651e4c7b09e576a341ad425cf80c393637833	2013-01-10 08:23:59 -08:00
Ronald S. Bultje	6884a83f06	Merge superblocks64 experiment. Change-Id: If6c88752dffdb566f8d4322f135145270716fb8e	2013-01-09 17:21:40 -08:00
Adrian Grange	7d6b5425d7	New prediction filter This patch removes the old pred-filter experiment and replaces it with one that is implemented using the switchable filter framework. If the pred-filter experiment is enabled, three interopolation filters are tested during mode selection; the standard 8-tap interpolation filter, a sharp 8-tap filter and a (new) 8-tap smoothing filter. The 6-tap filter code has been preserved for now and if the enable-6tap experiment is enabled (in addition to the pred-filter experiment) the original 6-tap filter replaces the new 8-tap smooth filter in the switchable mode. The new experiment applies the prediction filter in cases of a fractional-pel motion vector. Future patches will apply the filter where the mv is pel-aligned and also to intra predicted blocks. Change-Id: I08e8cba978f2bbf3019f8413f376b8e2cd85eba4	2013-01-09 12:00:39 -08:00
Ronald S. Bultje	cd0f36b24f	Merge "Merge superblocks (32x32) experiment." into experimental	2013-01-08 13:31:37 -08:00
Yunqing Wang	f1c56a8c8c	Merge "vp9_sub_pixel_variance16x2 SSE2 optimization" into experimental	2013-01-08 12:59:08 -08:00
Ronald S. Bultje	4455036cfc	Merge superblocks (32x32) experiment. Change-Id: I0df99742029834a85c4933652b0587cf5b6b2587	2013-01-08 12:54:45 -08:00
Yunqing Wang	8d568312a2	vp9_sub_pixel_variance16x2 SSE2 optimization About 5% decoder speedup. Change-Id: Ib6687d337af758a536a0e7e289f400990f1f9794	2013-01-08 12:01:55 -08:00
John Koleszar	879cb7d962	Merge vp9-preview changes into experimental branch Incorportate vp9-preview changes by merging master branch into experimental. Conflicts: test/test.mk vp9/common/vp9_filter.c vp9/common/vp9_idctllm.c vp9/common/vp9_invtrans.h vp9/common/vp9_mbpitch.c vp9/common/vp9_rtcd_defs.sh vp9/common/vp9_systemdependent.h vp9/common/vp9_type_aliases.h vp9/common/x86/vp9_asm_stubs.c vp9/common/x86/vp9_subpixel_mmx.asm vp9/decoder/vp9_decodframe.c vp9/decoder/vp9_dequantize.c vp9/decoder/vp9_dequantize.h vp9/decoder/vp9_onyxd_int.h vp9/encoder/vp9_bitstream.c vp9/encoder/vp9_encodeframe.c vp9/encoder/vp9_rdopt.c Change-Id: I17f51c3666d1b59cf1a699f87607cbc5d30a87c5	2013-01-08 10:19:59 -08:00
Ronald S. Bultje	c3941665e9	64x64 blocksize support. 3.2% gains on std/hd, 1.0% gains on hd. Change-Id: I481d5df23d8a4fc650a5bcba956554490b2bd200	2013-01-05 18:20:25 -08:00
John Koleszar	5ebe94f9f1	Build fixes to merge vp9-preview into master Various fixups to resolve issues when building vp9-preview under the more stringent checks placed on the experimental branch. Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07	2012-12-26 11:21:09 -08:00
Scott LaVarnway	89ac94f8fb	Removed mmx versions of vp9_bilinear_predict filters These filters will not work with VP9. Change-Id: Ic26c77961084fcea6bfa97f4cd95afdea2282e85	2012-12-21 14:41:49 -08:00
Scott LaVarnway	08dabbcee1	Disabled x86inc style assembly functions Temporary fix for 32-bit mac build errors. Change-Id: I2038f033cac16ea796097d0edd0f1c3da03246d7	2012-12-19 11:53:43 -08:00
Ronald S. Bultje	4cca47b538	Use standard integer types for pixel values and coefficients. For coefficients, use int16_t (instead of short); for pixel values in 16-bit intermediates, use uint16_t (instead of unsigned short); for all others, use uint8_t (instead of unsigned char). Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da	2012-12-18 15:31:19 -08:00
Scott LaVarnway	b575394e21	Improved vp9_ihtllm_c As suggested by Yaowu, we can use eob to reduce the complexity of the vp9_ihtllm_c function. For the 1080p test clip used, the decoder performance improved by 17%. Change-Id: I32486f2f06f9b8f60467d2a574209aa3a3daa435	2012-12-12 15:49:39 -08:00
Ronald S. Bultje	c456b35fdf	32x32 transform for superblocks. This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds code all over the place to wrap that in the bitstream/encoder/decoder/RD. Some implementation notes (these probably need careful review): - token range is extended by 1 bit, since the value range out of this transform is [-16384,16383]. - the coefficients coming out of the FDCT are manually scaled back by 1 bit, or else they won't fit in int16_t (they are 17 bits). Because of this, the RD error scoring does not right-shift the MSE score by two (unlike for 4x4/8x8/16x16). - to compensate for this loss in precision, the quantizer is halved also. This is currently a little hacky. - FDCT and IDCT is double-only right now. Needs a fixed-point impl. - There are no default probabilities for the 32x32 transform yet; I'm simply using the 16x16 luma ones. A future commit will add newly generated probabilities for all transforms. - No ADST version. I don't think we'll add one for this level; if an ADST is desired, transform-size selection can scale back to 16x16 or lower, and use an ADST at that level. Additional notes specific to Debargha's DWT/DCT hybrid: - coefficient scale is different for the top/left 16x16 (DCT-over-DWT) block than for the rest (DWT pixel differences) of the block. Therefore, RD error scoring isn't easily scalable between coefficient and pixel domain. Thus, unfortunately, we need to compute the RD distortion in the pixel domain until we figure out how to scale these appropriately. Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b	2012-12-07 14:45:05 -08:00
Johann	52d350febf	Begin to refactor vpx_scale usage in VP9 Only declare the functions in vpx_scale RTCD and include the relevant header. Remove unused files and functions in vpx_scale to avoid wasting time renaming. vpx_scale/win32/scaleopt.c contains functions which have not been called in a long time but are potentially optimized. The 'vp8' functions have not been renamed yet. That is for after the cleanup. Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c	2012-12-05 08:59:40 -08:00
Johann	a905672906	Remove ARM optimizations from VP9 Change-Id: I9f0ae635fb9a95c4aa1529c177ccb07e2b76970b	2012-12-05 08:59:25 -08:00
Johann	c6bd29e2f5	Begin to refactor vpx_scale usage in VP9 Only declare the functions in vpx_scale RTCD and include the relevant header. Remove unused files and functions in vpx_scale to avoid wasting time renaming. vpx_scale/win32/scaleopt.c contains functions which have not been called in a long time but are potentially optimized. The 'vp8' functions have not been renamed yet. That is for after the cleanup. Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c	2012-12-03 12:51:56 -08:00
Johann	34591b54dd	Remove ARM optimizations from VP9 Change-Id: I9f0ae635fb9a95c4aa1529c177ccb07e2b76970b	2012-12-03 12:50:15 -08:00
Jim Bankoski	9f9370425b	warnings in various experiments Change-Id: Ib5106d4772450f8026f823dd743f162ab833b1d6	2012-11-30 07:31:37 -08:00
Jim Bankoski	030e268a90	ihtllm moves to rtcd clears up some warnings Change-Id: I9899637497c6ad7519f098e055ab98580ae6d688	2012-11-29 07:19:38 -08:00
Jim Bankoski	85cba19e16	remove postproc invokes and some miscellaneous invoke left overs Change-Id: I63191b1bfd3bea4ce30cceaeb686ec850570fc43	2012-11-28 10:00:25 -08:00
John Koleszar	fcccbcbb39	Add vp9_ prefix to all vp9 files Support for gyp which doesn't support multiple objects in the same static library having the same basename. Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc	2012-11-27 14:12:30 -08:00

... 2 3 4 5 6

259 Commits