generic-library/vpx

Author	SHA1	Message	Date
Julia Robson	711600e5f1	SSSE3 optimisations of high bit depth masked variance functions Includes tests which compare output of new SSSE3 functions with their C equivalents, and fixes to the C code to ensure these tests pass. Change-Id: Iec3980cce95a8ee6bf9421fa4793130e92c162e3	2015-12-04 11:59:30 -08:00
Julia Robson	ea167a5855	Adding SSSE3 accelerations of masked SAD functions Includes tests of masked SAD function optimisations against C versions Change-Id: I42f198767a113b58ae9456841f4ec71075591720	2015-12-01 09:55:24 -08:00
Julia Robson	8e4d779137	SSSE3 optimisations of masked variance function (8bit ONLY) Includes test which compares output of new SSSE3 functions with their C equivalents Change-Id: I4488cd7672cdb57efff93c0b3b8bff07f07ec544	2015-12-01 12:07:22 +00:00
Julia Robson	ef01ea152d	Changes so other expts work with 128x128 coding unit size expt Changes ensure wedge_partition, interintra and palette expts all work alongside the ext_coding_unit_size experiment. Change-Id: I18f17acb29071f6fc6784e815661c73cc21144d6	2015-11-26 17:43:00 +00:00
Julia Robson	3d9133b2a5	SSE2 optim of vp9_subtract_block for 128x128 units Extending the SSE2 implementation of vp9_subtract_block to work with the 128x128 coding unit experiment Change-Id: Ib3cc16bf5801ef2c7eecc19d3cc07a8c50631580	2015-11-13 11:12:56 -08:00
Debargha Mukherjee	59de0c0bc7	Adding encoder support for 128x128 coding units Changes to allow the encoder to make use of 128x128 coding units. Change-Id: I340bd38f9d9750cb6346d83885efb00443852910	2015-11-13 09:21:22 -08:00
Shunyao Li	2de18d1fd2	Super resolution mode (+CONFIG_SR_MODE) CONFIG_SR_MODE=1, enable SR mode USE_POST_F=1, enable SR post filter SR_USE_MULTI_F=1, enable SR post filter family Not compatible with other experiments yet Change-Id: I116f1d898cc2ff7dd114d7379664304907afe0ec	2015-08-31 15:29:39 -07:00
Debargha Mukherjee	4b57a8b356	Add extended transforms for 32x32 and 64x64 Framework for alternate transforms for inter 32x32 and larger based on dwt-dct hybrid is implemented. Further experiments are to be condcuted with different variations of hybrid dct/dwt or plain dwt, as well as super-resolution mode. Change-Id: I9a2bf49ba317e7668002cf1499211d7da6fa14ad	2015-07-23 18:01:22 -07:00
Debargha Mukherjee	b433dd4443	Adds wavelet transforms + hybrid dct/dwt variants The wavelets implemented are 2/6, 5/3 and 9/7 each with a lifting based scheme for even block sizes. The 9/7 one is a double implementation currently. This is to start experiments with: 1. Replacing large transforms (32x32 and 64x64) with wavelets or wavelet-dct hybrids that can hopefully localize errors better spatially. (Will also need alternate entropy coder) 2. Super-resolution modes where the higher sub-bands may be selectively skipped from being conveyed, while a smart reconstruction recovers the lost frequencies. The current patch includes two types of 32x32 and 64x64 transforms: one where only wavelets are used, and another where a single level wavelet decomposition is followed by a lower resolution dct on the low-low band. Change-Id: I2d6755c4e6c8ec9386a04633dacbe0de3b0043ec	2015-06-08 23:30:38 -07:00
Peter de Rivaz	d6153aa447	Added highbitdepth sse2 acceleration for quantize and block error This is a partial cherry-pick of db7192e Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78	2015-05-06 15:14:01 -07:00
Peter de Rivaz	2dad1a7c8e	Added high bitdepth sse2 transform functions Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282	2015-05-06 10:10:18 -07:00
Peter de Rivaz	2189a51891	Added sse2 acceleration for highbitdepth variance This is a combination of: 4a19fa6 Added sse2 acceleration for highbitdepth variance c6f5d3b Fix high bit depth assembly function bugs Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f	2015-05-06 10:04:08 -07:00
Peter de Rivaz	0e82cba628	Added highbitdepth sse2 SAD acceleration and tests Change-Id: I9f09e404e3136951e5cc15bf40b915c1fe10b620	2015-05-06 09:00:53 -07:00
Debargha Mukherjee	343c092e2e	High bit-depth support for wedge partition expt Change-Id: Idbd27e66d4f4a7953f888137d5752856215a6760	2015-04-13 09:28:15 -07:00
Deb Mukherjee	8c5ac79e66	Some build fixes with highbitdepth and new quant Highbitdepth performance about the same as 8-bit. Change-Id: If737962d8588dd190083edae4383b731f9d22873	2015-03-21 06:53:58 -07:00
Deb Mukherjee	c8ed36432e	Non-uniform quantization experiment This framework allows lower quantization bins to be shrunk down or expanded to match closer the source distribution (assuming a generalized gaussian-like central peaky model for the coefficients) in an entropy-constrained sense. Specifically, the width of the bins 0-4 are modified as a factor of the nominal quantization step size and from 5 onwards all bins become the same as the nominal quantization step size. Further, different bin width profiles as well as reconstruction values can be used based on the coefficient band as well as the quantization step size divided into 5 ranges. A small gain currently on derflr of about 0.16% is observed with the same paraemters for all q values. Optimizing the parameters based on qstep value is left as a TODO for now. Results on derflr with all expts on is +6.08% (up from 5.88%). Experiments are in progress to tune the parameters for different coefficient bands and quantization step ranges. Change-Id: I88429d8cb0777021bfbb689ef69b764eafb3a1de	2015-03-17 21:42:55 -07:00
Jingning Han	50cab76f12	Removal of legacy zbin_extra / zbin_oq_value. Change-Id: I07f77a63aa98087626e45c4e87aa5dcafc0b0b07 (cherry picked from commit d0f237702745c4bfc0297d24f9465f960fb988ed)	2015-01-21 20:37:19 -08:00
Deb Mukherjee	db5dd49996	Adds wedge-partitions for compound prediction Results with this experiment only: +0.642% on derflr. With other experiments: +4.733% Change-Id: Ieb2022f8e49ac38a7e7129e261a6bf69ae9666b9	2015-01-15 15:59:33 -08:00
Deb Mukherjee	8bdf4cebb9	Merge "Adding a 64x64 transform mode" into nextgen	2014-10-30 00:51:35 -07:00
Deb Mukherjee	0c7a94f49b	Adding a 64x64 transform mode Preliminary 64x64 transform implementation. Includes all code changes. All mismatches resolved. Coding results for derf and stdhd are within noise. stdhd is slightly higher, derf is slightly lower. To be further refined. Change-Id: I091c183f62b156d23ed6f648202eb96c82e69b4b	2014-10-30 00:45:57 -07:00
Yunqing Wang	687c56e802	Merge "SAD32xh and SAD64xh for AVX2"	2014-10-20 12:37:55 -07:00
levytamar82	7045aec00a	SAD32xh and SAD64xh for AVX2 All sad function that process above 32 consecutive elements are optimized for AVX2: vp9_sad64x64 vp9_sad64x32 vp9_sad32x64 vp9_sad32x32 vp9_sad32x16 vp9_sad64x64_avg vp9_sad64x32_avg vp9_sad32x64_avg vp9_sad32x32_avg vp9_sad32x16_avg The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64 vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90% both of them gave and overall ~2.3% user level gain Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd	2014-10-19 13:59:10 -07:00
Peter de Rivaz	73ae6e495c	Add highbitdepth function for vp9_avg_8x8 Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/ (a92f987a6b7819ae5c62a429e126e1c26bdb1b71) on highbitdepth branch. Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8	2014-10-17 17:04:37 -07:00
Alex Converse	7497d2fb23	Add a 32-bit friendly sse2 quantizer. This is based on the 64-bit ssse3 quantizer. 1.1x speedup for screen content at speed 7. Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448	2014-10-14 11:37:41 -07:00
Deb Mukherjee	9a29fdbae7	Merge "Rename highbitdepth functions to use highbd prefix"	2014-10-09 15:39:56 -07:00
Deb Mukherjee	1929c9b391	Rename highbitdepth functions to use highbd prefix Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e	2014-10-09 14:40:40 -07:00
James Zern	caa0f81914	vp9_rtcd_defs: fix vp9_avg_8x8 declaration vp9_avg_8x8 does not depend on x86inc, fixes 32-bit OS X build Change-Id: I709b874ea84bf57c8cdb5ac7d43eecc6b8c1a2dd	2014-10-09 10:44:42 +02:00
Jim Bankoski	0ce51d823f	experimental : partition using 1/8 x 1/8 image The concept: There's too much noise in source pixels for variance and at low bitrate the reconstructed looks nothing like the source so we have problems getting good partitionings with either. This skirts the issue by using a box blur scaled down version for variance calculations. To compare against source_var_ moved keyframe to be rd based like source_var. Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624	2014-10-07 16:36:14 -07:00
JackyChen	a9f479682a	Merge "Add SSE2 code and unit test for VP9 denoiser."	2014-10-07 10:51:55 -07:00
JackyChen	80465dae88	Add SSE2 code and unit test for VP9 denoiser. This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are only 16x16 blocks in denoiser, while in VP9, there are 13 different block sizes. By adding this SSE2 code, the improvement of encoder speed is around 20%(using C code vs using SSE2 code), vary for different clips. The unit test for VP9 denoiser is to confirm that the SSE2 code is bit-exact with the C code. The unit test covers all block size. Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d	2014-10-06 15:27:40 -07:00
Deb Mukherjee	d50716face	Incorporate WRAPLOW macro into non-highbitdepth tx Incorporates the WRAPLOW macro into the non-highbitdepth transforms to aid hardware verification between a software C model and an intended hardware implementation though the use of the configure options: --enable-experimental --enable-emulate-hardware. Note that to avoid further discrepancies between the sse/sse2 implementations of the transforms and the C implementation, when the emulate hardware option is invoked, we also disable sse/sse2/etc. Also incudes some minor cleanups/renaming etc. Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287	2014-10-03 11:38:05 -07:00
Deb Mukherjee	a160d72522	High-bitdepth bugfixes Miscellaneous bug-fixes for high bitdepth functionality. With this patch, high bit-depth profiles become mostly functional, except for an intermittent assert failure issue that is being tracked. Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c	2014-10-01 14:18:11 -07:00
Deb Mukherjee	872b207b78	Moves transform type defines to vp9_common Moves transform type defines to vp9_common.h from vp9_idct.h so that they can be included in vp9_rtcd_defs.pl safely. Change-Id: Id5106227bee5934f7ce8b06f2eb9fa8a9a2e0ddb	2014-09-30 19:44:17 -07:00
James Zern	4a296e6baa	Revert "Fix compiling error in vp9_idct.h" This reverts commit eafc8c9c40d712aabe234bed5269a02c62fa0bfc. tran_low_t/tran_high_t don't belong in a public header, they're private. Similarly the public headers shouldn't rely on config defines, vpx_config.h isn't installed. Change-Id: I194ec273598da418df8dd727b6c0e78a556740ad	2014-09-30 16:08:55 -07:00
Jingning Han	eafc8c9c40	Fix compiling error in vp9_idct.h This commit fixes a compiling error in vp9_idct.h, where the codec checks that the intermediate steps of transformation fit within 16-bit length. The issue was due to broken file dependency. Change-Id: Ib22bba13a1e6df28489cb23d6774c561969f1fdc	2014-09-30 09:11:59 -07:00
Deb Mukherjee	931ed516ba	High bit-depth loop/arf/postproc filter functions Adds high-bitdepth loopfilter, temporal filter and postproc functions Change-Id: I81c8a9176890784686bc4f2af0d550d243b3b2d3	2014-09-23 16:20:43 -07:00
Deb Mukherjee	0d3c3d3ce7	Adds high bitdepth convolve, interpred & scaling Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3	2014-09-18 07:26:17 -07:00
Deb Mukherjee	81a8138fc3	Adding high-bitdepth intra prediction functions Change-Id: I6f5cb101e2dc57c3d3f4d7e0ffb4ddbed027d111	2014-09-16 15:04:39 -07:00
Deb Mukherjee	5cd0aab81a	Adds high bitdepth quantization functions Adds various high bitdepth quantization functions. Change-Id: I36fc0bf75a1bd15128ed271df8723de0ac134b0c	2014-09-16 14:55:37 -07:00
Yaowu Xu	601f3a886e	Fix a performance regression This commit adds back sse2 or ssse3 optimized versio of a couple of functions, fixes a ~10% performance regression. Change-Id: I049786906e5a641224dced63c6492aec9d86d183	2014-09-16 11:18:46 -07:00
Deb Mukherjee	10783d4f3a	Adds high bitdepth transform functions and tests Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8	2014-09-11 19:56:33 -07:00
Deb Mukherjee	1e4136d35d	Adds high bit depth sad and variance functions Moves high bit depth sad/var functions from highbitdepth branch to master. Change-Id: If03845d8ef9c9c494e13350e7a587c289306b94d	2014-09-11 17:30:44 -07:00
Johann	8645a53039	Allow specifying opt dependencies If optimizations use more than one cpu feature, allow specifying them so that '--disable-X' still works https://code.google.com/p/webm/issues/detail?id=854 Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18	2014-09-11 13:43:48 -07:00
Dmitry Kovalev	980abf6078	Fixing Mac OS build. Change-Id: Ifae8906185a868a07685eb7a7da2484af95e70a7	2014-09-08 08:53:12 -07:00
Dmitry Kovalev	89963bf586	Merge "Removing postproc mmx code."	2014-09-05 18:11:08 -07:00
Dmitry Kovalev	1100e262c5	Removing postproc mmx code. Removed functions: * vp9_post_proc_down_and_across_mmx * vp9_mbpost_proc_down_mmx * vp9_plane_add_noise_mmx They all have sse2 equivalent. Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff	2014-09-05 11:52:50 -07:00
James Zern	a8083449e9	fix x86-darwin* build vp9_variance_sse2.c contains a mix of intrinsics and references to assembly which uses x86inc.asm; it's conditionally included as a result. Change-Id: I254451483a65881c0b8e18e27bf0c3ddef60c4ec	2014-09-04 23:32:13 -07:00
Dmitry Kovalev	490943552f	Removing unused function prototypes. Change-Id: Ia5e383e2cf18052f6f1eacf8b9495ab8e4d58878	2014-09-04 14:26:30 -07:00
Dmitry Kovalev	48197f0a70	Adding sse2 variant for vp9_mse{8x8, 8x16, 16x8}. Change-Id: I6786d25ce4f32b8d8912f2d239a45ca15b310c4b	2014-09-03 19:02:14 -07:00
Dmitry Kovalev	318fc0c34f	Removing MMX SAD calculation code. Removed functions: * vp9_sad_16x16_mmx * vp9_sad_8x16_mmx * vp9_sad_16x8_mmx * vp9_sad_8x8_mmx * vp9_sad_4x4_mmx Change-Id: Ic5174b93b64d65d846f0c11e72cab149e9472bc3	2014-09-02 14:41:36 -07:00

1 2 3

114 Commits