generic-library/vpx

Author	SHA1	Message	Date
John Koleszar	3de8ee6ba1	Merge changes Ife0d8147,I7d469716,Ic9a5615f into experimental * changes: Restore SSSE3 subpixel filters in new convolve framework Convert subpixel filters to use convolve framework Add 8-tap generic convolver	2013-02-08 13:19:47 -08:00
John Koleszar	29d47ac80e	Restore SSSE3 subpixel filters in new convolve framework This commit adds the 8 tap SSSE3 subpixel filters back into the code underneath the convolve API. The C code is still called for 4x4 blocks, as well as compound prediction modes. This restores the encode performance to be within about 8% of the baseline. Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c	2013-02-08 12:18:14 -08:00
Jingning Han	d15e1da494	Butterfly ADST based hybrid transform Refactor the 8x8 inverse hybrid transform. It is now consistent with the new inverse DCT. Overall performance loss (due to the use of this variant ADST, and the rounding errors in the butterfly implementation) for std-hd is -0.02. Fixed BUILD warning. Devise a variant of the original ADST, which allows butterfly computation structure. This new transform has kernel of the form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures using floating-point multiplications was reported in Z. Wang, "Fast algorithms for the discrete W transform and for the discrete Fourier transform", IEEE Trans. on ASSP, 1984. This patch includes the butterfly implementation of the inverse ADST/DCT hybrid transform of dimension 8x8. Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d	2013-02-07 10:07:46 -08:00
Ronald S. Bultje	a788e0fe63	Add sse2 versions of sub_pixel_variance{32x32,64x64}. 7.5% faster overall encoding. Change-Id: Ie9bb7f9fdf93659eda106404cb342525df1ba02f	2013-02-06 11:20:59 -08:00
Ronald S. Bultje	1407bdc243	[WIP] Add column-based tiling. This patch adds column-based tiling. The idea is to make each tile independently decodable (after reading the common frame header) and also independendly encodable (minus within-frame cost adjustments in the RD loop) to speed-up hardware & software en/decoders if they used multi-threading. Column-based tiling has the added advantage (over other tiling methods) that it minimizes realtime use-case latency, since all threads can start encoding data as soon as the first SB-row worth of data is available to the encoder. There is some test code that does random tile ordering in the decoder, to confirm that each tile is indeed independently decodable from other tiles in the same frame. At tile edges, all contexts assume default values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode), and motion vector search and ordering do not cross tiles in the same frame. t log Tile independence is not maintained between frames ATM, i.e. tile 0 of frame 1 is free to use motion vectors that point into any tile of frame 0. We support 1 (i.e. no tiling), 2 or 4 column-tiles. The loopfilter crosses tile boundaries. I discussed this briefly with Aki and he says that's OK. An in-loop loopfilter would need to do some sync between tile threads, but that shouldn't be a big issue. Resuls: with tiling disabled, we go up slightly because of improved edge use in the intra4x4 prediction. With 2 tiles, we lose about ~1% on derf, ~0.35% on HD and ~0.55% on STD/HD. With 4 tiles, we lose another ~1.5% on derf ~0.77% on HD and ~0.85% on STD/HD. Most of this loss is concentrated in the low-bitrate end of clips, and most of it is because of the loss of edges at tile boundaries and the resulting loss of intra predictors. TODO: - more tiles (perhaps allow row-based tiling also, and max. 8 tiles)? - maybe optionally (for EC purposes), motion vectors themselves should not cross tile edges, or we should emulate such borders as if they were off-frame, to limit error propagation to within one tile only. This doesn't have to be the default behaviour but could be an optional bitstream flag. Change-Id: I5951c3a0742a767b20bc9fb5af685d9892c2c96f	2013-02-05 15:43:03 -08:00
Ronald S. Bultje	822864131b	Merge "Add SSE3 versions for sad{32x32,64x64}x4d functions." into experimental	2013-02-05 15:40:46 -08:00
Ronald S. Bultje	58c983d109	Add SSE3 versions for sad{32x32,64x64}x4d functions. Overall encoding about 15% faster. Change-Id: I176a775c704317509e32eee83739721804120ff2	2013-02-05 15:21:47 -08:00
John Koleszar	7a07eea13f	Convert subpixel filters to use convolve framework Update the code to call the new convolution functions to do subpixel prediction rather than the existing functions. Remove the old C and assembly code, since it is unused. This causes a 50% performance reduction on the decoder, but that will be resolved when the asm for the new functions is available. There is no consensus for whether 6-tap or 2-tap predictors will be supported in the final codec, so these filters are implemented in terms of the 8-tap code, so that quality testing of these modes can continue. Implementing the lower complexity algorithms is a simple exercise, should it be necessary. This code produces slightly better results in the EIGHTTAP_SMOOTH case, since the filter is now applied in only one direction when the subpel motion is only in one direction. Like the previous code, the filtering is skipped entirely on full-pel MVs. This combination seems to give the best quality gains, but this may be indicative of a bug in the encoder's filter selection, since the encoder could achieve the result of skipping the filtering on full-pel by selecting one of the other filters. This should be revisited. Quality gains on derf positive on almost all clips. The only clip that seemed to be hurt at all datarates was football (-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR, 0.347% SSIM. Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff	2013-02-05 14:23:17 -08:00
John Koleszar	5ca6a3667f	Add 8-tap generic convolver This commit introduces a new convolution function which will be used to replace the existing subpixel interpolation functions. It is much the same as the existing functions, but allows for changing the filter kernel on a per-pixel basis, and doesn't bake in knowledge of the filter to be applied or the size of the resulting block into the function name. Replacing the existing subpel filters will come in a later commit. Change-Id: Ic9a5615f2f456cb77f96741856fc650d6d78bb91	2013-02-05 14:19:28 -08:00
Scott LaVarnway	5780c4cbd5	Added vp9_short_idct1_32x32_c and called this function in vp9_dequant_idct_add_32x32_c when eob == 1. For the test clip used, the decoder performance improved by 21+%. Based on Yaowu's 16 point idct work. Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43	2013-02-04 16:49:17 -08:00
Yaowu Xu	1eb79dc1dc	re-write 8 point idct to be consistent with idct16 and idct32. Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204	2013-02-04 07:31:25 -08:00
Yaowu Xu	ccaaeb4b5a	a couple of minor fixes fixed a function prototypes to prevent compiler warnings; removed a function not in use; un-capitialize "Refstride" to ref_stride Change-Id: Ib4472b6084f357d96328c6a06e795b6813a9edba	2013-02-04 07:19:32 -08:00
Yaowu Xu	91e0e80142	Changes 16 point idct This commit changes the inverse 16 point dct to use the same algorithm as the one for 32 point idct. In fact, now 16 point dct uses the exact version of the souce code for even portion of the 32 point idct. Tests showed current implementation has significant better accuracy than the previous version. With this implementation and the minor bug fix on forward 16 point dct, encoding tests showed about 0.2% better compression of CIF set, test results on std-hd setting pending. Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63	2013-01-31 19:52:18 -08:00
Ronald S. Bultje	c9071601a2	Remove compound intra-intra experiment. This experiment gives little gains and adds relatively much code complexity (and it hinders other experiments), so let's get rid of it. Change-Id: Id25e79a137a1b8a01138aa27a1fa0ba4a2df274a	2013-01-14 15:47:25 -08:00
Yaowu Xu	741fbe9656	Merge experiment "subpelrefmv" Change-Id: Iac7f3d108863552b850c92c727e00c95571c9e96	2013-01-14 15:18:47 -08:00
Yaowu Xu	f7dab60096	Merge experiment "widerlpf" Change-Id: I0c94475075e66e13cfe4c20fab7db6474441ae86	2013-01-14 15:17:35 -08:00
Scott LaVarnway	4987c0f07e	Initial sse2 version of the wide loopfilters Updated the rtcd_defs and used the sse2 uv version of the loopfilter. The performance improved by ~8% for the test clip used. Change-Id: I5a0bca3b6674198d40ca4a77b8cc722ddde79c36	2013-01-11 14:54:14 -08:00
Jim Bankoski	9431536045	rtcd for new wider loop filters Change-Id: I8826bcdcf72ba6d86bde31cd13902a710399805c	2013-01-11 09:45:45 -08:00
Ronald S. Bultje	aa2effa954	Merge tx32x32 experiment. Change-Id: I615651e4c7b09e576a341ad425cf80c393637833	2013-01-10 08:23:59 -08:00
Ronald S. Bultje	6884a83f06	Merge superblocks64 experiment. Change-Id: If6c88752dffdb566f8d4322f135145270716fb8e	2013-01-09 17:21:40 -08:00
Adrian Grange	7d6b5425d7	New prediction filter This patch removes the old pred-filter experiment and replaces it with one that is implemented using the switchable filter framework. If the pred-filter experiment is enabled, three interopolation filters are tested during mode selection; the standard 8-tap interpolation filter, a sharp 8-tap filter and a (new) 8-tap smoothing filter. The 6-tap filter code has been preserved for now and if the enable-6tap experiment is enabled (in addition to the pred-filter experiment) the original 6-tap filter replaces the new 8-tap smooth filter in the switchable mode. The new experiment applies the prediction filter in cases of a fractional-pel motion vector. Future patches will apply the filter where the mv is pel-aligned and also to intra predicted blocks. Change-Id: I08e8cba978f2bbf3019f8413f376b8e2cd85eba4	2013-01-09 12:00:39 -08:00
Ronald S. Bultje	cd0f36b24f	Merge "Merge superblocks (32x32) experiment." into experimental	2013-01-08 13:31:37 -08:00
Yunqing Wang	f1c56a8c8c	Merge "vp9_sub_pixel_variance16x2 SSE2 optimization" into experimental	2013-01-08 12:59:08 -08:00
Ronald S. Bultje	4455036cfc	Merge superblocks (32x32) experiment. Change-Id: I0df99742029834a85c4933652b0587cf5b6b2587	2013-01-08 12:54:45 -08:00
Yunqing Wang	8d568312a2	vp9_sub_pixel_variance16x2 SSE2 optimization About 5% decoder speedup. Change-Id: Ib6687d337af758a536a0e7e289f400990f1f9794	2013-01-08 12:01:55 -08:00
John Koleszar	879cb7d962	Merge vp9-preview changes into experimental branch Incorportate vp9-preview changes by merging master branch into experimental. Conflicts: test/test.mk vp9/common/vp9_filter.c vp9/common/vp9_idctllm.c vp9/common/vp9_invtrans.h vp9/common/vp9_mbpitch.c vp9/common/vp9_rtcd_defs.sh vp9/common/vp9_systemdependent.h vp9/common/vp9_type_aliases.h vp9/common/x86/vp9_asm_stubs.c vp9/common/x86/vp9_subpixel_mmx.asm vp9/decoder/vp9_decodframe.c vp9/decoder/vp9_dequantize.c vp9/decoder/vp9_dequantize.h vp9/decoder/vp9_onyxd_int.h vp9/encoder/vp9_bitstream.c vp9/encoder/vp9_encodeframe.c vp9/encoder/vp9_rdopt.c Change-Id: I17f51c3666d1b59cf1a699f87607cbc5d30a87c5	2013-01-08 10:19:59 -08:00
Ronald S. Bultje	c3941665e9	64x64 blocksize support. 3.2% gains on std/hd, 1.0% gains on hd. Change-Id: I481d5df23d8a4fc650a5bcba956554490b2bd200	2013-01-05 18:20:25 -08:00
John Koleszar	5ebe94f9f1	Build fixes to merge vp9-preview into master Various fixups to resolve issues when building vp9-preview under the more stringent checks placed on the experimental branch. Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07	2012-12-26 11:21:09 -08:00
Scott LaVarnway	89ac94f8fb	Removed mmx versions of vp9_bilinear_predict filters These filters will not work with VP9. Change-Id: Ic26c77961084fcea6bfa97f4cd95afdea2282e85	2012-12-21 14:41:49 -08:00
Scott LaVarnway	08dabbcee1	Disabled x86inc style assembly functions Temporary fix for 32-bit mac build errors. Change-Id: I2038f033cac16ea796097d0edd0f1c3da03246d7	2012-12-19 11:53:43 -08:00
Ronald S. Bultje	4cca47b538	Use standard integer types for pixel values and coefficients. For coefficients, use int16_t (instead of short); for pixel values in 16-bit intermediates, use uint16_t (instead of unsigned short); for all others, use uint8_t (instead of unsigned char). Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da	2012-12-18 15:31:19 -08:00
Scott LaVarnway	b575394e21	Improved vp9_ihtllm_c As suggested by Yaowu, we can use eob to reduce the complexity of the vp9_ihtllm_c function. For the 1080p test clip used, the decoder performance improved by 17%. Change-Id: I32486f2f06f9b8f60467d2a574209aa3a3daa435	2012-12-12 15:49:39 -08:00
Ronald S. Bultje	c456b35fdf	32x32 transform for superblocks. This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds code all over the place to wrap that in the bitstream/encoder/decoder/RD. Some implementation notes (these probably need careful review): - token range is extended by 1 bit, since the value range out of this transform is [-16384,16383]. - the coefficients coming out of the FDCT are manually scaled back by 1 bit, or else they won't fit in int16_t (they are 17 bits). Because of this, the RD error scoring does not right-shift the MSE score by two (unlike for 4x4/8x8/16x16). - to compensate for this loss in precision, the quantizer is halved also. This is currently a little hacky. - FDCT and IDCT is double-only right now. Needs a fixed-point impl. - There are no default probabilities for the 32x32 transform yet; I'm simply using the 16x16 luma ones. A future commit will add newly generated probabilities for all transforms. - No ADST version. I don't think we'll add one for this level; if an ADST is desired, transform-size selection can scale back to 16x16 or lower, and use an ADST at that level. Additional notes specific to Debargha's DWT/DCT hybrid: - coefficient scale is different for the top/left 16x16 (DCT-over-DWT) block than for the rest (DWT pixel differences) of the block. Therefore, RD error scoring isn't easily scalable between coefficient and pixel domain. Thus, unfortunately, we need to compute the RD distortion in the pixel domain until we figure out how to scale these appropriately. Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b	2012-12-07 14:45:05 -08:00
Johann	52d350febf	Begin to refactor vpx_scale usage in VP9 Only declare the functions in vpx_scale RTCD and include the relevant header. Remove unused files and functions in vpx_scale to avoid wasting time renaming. vpx_scale/win32/scaleopt.c contains functions which have not been called in a long time but are potentially optimized. The 'vp8' functions have not been renamed yet. That is for after the cleanup. Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c	2012-12-05 08:59:40 -08:00
Johann	a905672906	Remove ARM optimizations from VP9 Change-Id: I9f0ae635fb9a95c4aa1529c177ccb07e2b76970b	2012-12-05 08:59:25 -08:00
Johann	c6bd29e2f5	Begin to refactor vpx_scale usage in VP9 Only declare the functions in vpx_scale RTCD and include the relevant header. Remove unused files and functions in vpx_scale to avoid wasting time renaming. vpx_scale/win32/scaleopt.c contains functions which have not been called in a long time but are potentially optimized. The 'vp8' functions have not been renamed yet. That is for after the cleanup. Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c	2012-12-03 12:51:56 -08:00
Johann	34591b54dd	Remove ARM optimizations from VP9 Change-Id: I9f0ae635fb9a95c4aa1529c177ccb07e2b76970b	2012-12-03 12:50:15 -08:00
Jim Bankoski	9f9370425b	warnings in various experiments Change-Id: Ib5106d4772450f8026f823dd743f162ab833b1d6	2012-11-30 07:31:37 -08:00
Jim Bankoski	030e268a90	ihtllm moves to rtcd clears up some warnings Change-Id: I9899637497c6ad7519f098e055ab98580ae6d688	2012-11-29 07:19:38 -08:00
Jim Bankoski	85cba19e16	remove postproc invokes and some miscellaneous invoke left overs Change-Id: I63191b1bfd3bea4ce30cceaeb686ec850570fc43	2012-11-28 10:00:25 -08:00
John Koleszar	fcccbcbb39	Add vp9_ prefix to all vp9 files Support for gyp which doesn't support multiple objects in the same static library having the same basename. Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc	2012-11-27 14:12:30 -08:00

1 2 3 4 5

241 Commits