generic-library/vpx

Author	SHA1	Message	Date
levytamar82	52dac5d1cb	AVX2 SubPixel Variance Optimization Optimizing 2 functions to process 32 elements in parallel instead of 16: 1. vp9_sub_pixel_variance64x64 2. vp9_sub_pixel_variance32x32 both of those function were calling vp9_sub_pixel_variance16xh_ssse3 instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2 that is written in avx2 and process 32 elements in parallel. This Optimization gave 70% function level gain and 2% user level gain Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde	2014-02-14 16:59:11 -07:00
James Zern	66bfc69bfc	Merge "*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/"	2014-02-10 15:39:28 -08:00
Dmitry Kovalev	005fc6970b	Finally removing "short" from transform names. Change-Id: I5259b68dc1bcceb153e3ffe638a79a59a3019e9d	2014-02-06 11:54:15 -08:00
Dmitry Kovalev	8b53947a42	Renaming vp9_sad_c.c to vp9_sad.c. Change-Id: I0beb01b0209cf4ae849b4c67d72107b631f46c0d	2014-02-05 11:31:15 -08:00
James Zern	7cf0c783c1	*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/ CONFIG_USE_X86INC is available to every makefile, there's no need to duplicate its value with USE_X86INC Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6	2014-02-04 20:04:38 -08:00
Dmitry Kovalev	9d6d35c5ef	Renaming vp9_variance_c.c to vp9_variance.c. Change-Id: I7b29cb18ad36d79e1c6329c7de88496059f49db4	2014-02-04 14:49:43 -08:00
Deb Mukherjee	3cd37dfeb5	Adds a non-normative resize library to vp9 encoder Adds an arbitrary-size resize library for use in scaling of input frames in a non-normative manner in the vp9 encoder. The method used is as follows: Downsampling - Uses a 8 tap filter for factor of 2 decimation upto a size just higher than the desired size. Then interpolates pixels at a precision of 1/32 pel using a set of 8-tap filters. Upsampling - Interpolates pixels at a precision of 1/32 pel using a set of 8-tap filters. There is no assembly optimization yet. Change-Id: Ib5b81e174fc139da322bb97c8214d52289d60d8a	2014-01-21 16:50:00 -08:00
Dmitry Kovalev	c2b5a39345	Removing duplicated SAD calculation code. Change-Id: I8d693371a29103769d5bed9d5f9cfe4f58ca3189	2014-01-21 14:24:37 -08:00
Jingning Han	2f52decd22	Inter-frame non-RD mode decision This commit setups a test framework for real-time coding. It enables a light motion search for non-RD mode decision purpose. Change-Id: I8bec656331539e963c2b685a70e43e0ae32a6e9d	2014-01-16 12:35:04 -08:00
levytamar82	357b65369f	AVX2 Variance Optimization Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32, vp9_variance64x64, vp9_variance32x16, vp9_variance64x32, vp9_mse16x16 by migrating to AVX2 some of the functions were optimized by processing 32 elements instead of 16. some of the functions were optimized by processing 2 loop strides of 16 elements in a single 256 bit register This optimization gives between 2.4% - 2.7% user level performance gain and 42% function level gain. Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d	2014-01-08 12:05:53 -07:00
Dmitry Kovalev	4084566554	Renaming vp9_boolcoder.{h, c} to vp9_writer.{h, c}. Change-Id: I9b9a5fcce8530284df0f270706ee060a0edc1517	2013-12-20 11:10:24 -08:00
Yaowu Xu	e9c19617bf	Merge "vp9_short_fdct32x32_rd vp9_short_fdct32x32 optimized for AVX2"	2013-11-27 10:27:32 -08:00
Dmitry Kovalev	204ff1c868	Removing vp9_modecosts.{c, h} files. Renaming vp9_init_mode_costs() to fill_mode_costs() and moving it to vp9_rdopt.c. Change-Id: Ib2542d216458f6dced9f4b7ccbdd2cd98176aa5a	2013-11-25 12:44:05 -08:00
levytamar82	8def766de2	vp9_short_fdct32x32_rd vp9_short_fdct32x32 optimized for AVX2 Change-Id: I6366e84490883b72362f762369d7e5bccb64f02f	2013-11-21 14:19:49 -08:00
Yaowu Xu	30b03050a2	Move vp9_sadmxn.h from common to encoder Change-Id: I6f6ba91b1b8b280902b171472314d665aa0baf0b	2013-11-19 12:46:08 -08:00
Yaowu Xu	1c61e1960d	Move vp9_extend.{h,c} from common to encoder Since they used in encoder only. This commit also re-order includes for the files that include vp9_extend.h Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459	2013-11-18 12:43:36 -08:00
Dmitry Kovalev	5380739a87	Removing vp9_encodeintra.{h, c} files. There was only one function in *.c file, so moving it to vp9_encodemb.c. Change-Id: I728859d08b3d6c05c33c1c5b21f0ea1d0e0f83af	2013-11-15 12:17:16 -08:00
Dmitry Kovalev	ae2f732e8c	Adding fht{4x4, 8x8, 16x16} functions. Adding these functions to encapsulate tx_type check. Changing TX_TYPE to int to match the declaration in vo9_rtch.h. Change-Id: I6f3a2df6e35595ca73b6aaa9e3909ee7bc3fd16f	2013-10-25 17:55:07 -07:00
Guillaume Martres	e55f60240a	Implement variance-based adaptive quantization This should be similar to what x264 does with --aq-mode 1. It works well with clips like parkjoy and touhou (http://x264.nl/developers/Dark_Shikari/LosslessTouhou.mkv). At low bitrates, the segmentation signaling overhead may negate the benefits of this feature. (PGW) Default changed to feature OFF to allow provisional merge. Change-Id: I938abf9bb487e1d4ad3b0264ea03d9826275c70b	2013-10-16 11:55:13 +01:00
Jim Bankoski	79401542f7	make vp9 postproc a config option Vp9 postproc is disabled for now as its not been shown to help and may be merged with vp8. Change-Id: I25620d6cd34c6e10331b18c7b5ef7482e39c6057	2013-09-04 10:02:08 -07:00
Jim Bankoski	5b307886fb	variance x86inc guards also fixed bug in sad calcs Change-Id: I6571fcbe37556c16ae32be66dc0fd879852aac1d	2013-08-06 14:17:13 -07:00
Jim Bankoski	c9126e0b30	sad + miscellaneous updates Enable use_x86inc as a commandline option. Fix Bug with sse2 when x86inc is disabled. Adds Sad asm protection to x86inc protection Change-Id: Iee0f9dd235ea10e8ace512eb362ba9bebe8c9df6	2013-08-06 12:16:04 -07:00
Jingning Han	7d61f8fe53	Merge "Move fdct32x32 SSE2 implementation in separate file."	2013-08-06 10:46:41 -07:00
Christian Duvivier	3d98205fce	Move fdct32x32 SSE2 implementation in separate file. This is in preparation for the SSE2 version of the high-precision 32x32 forward DCT which will share a lot of code with the existing low precision version used for rate-distortion search. Change-Id: I7084b6bdfb480b1fabb8493fb14e3f7fcc7888c0	2013-08-06 10:17:11 -07:00
Jim Bankoski	62c6aa884d	block error / x86inc mods Change-Id: Icb607745634e10b9bac5019d06661ece09fcdb40	2013-08-06 06:23:38 -07:00
Jim Bankoski	a93b115cd6	reworked config for use_x86_inc Support enabling it or disabling it. Moved read out to configure.sh so that its done once instead of in make and in config. Change-Id: I73a9190cf31de9f03e8a577f478fa522f8c01c8b	2013-08-05 17:35:25 -07:00
Ronald S. Bultje	c13e0bcb52	Remove unused fwalsh/fdct x86 SIMD implementations. Change-Id: Ia942e56cf322821d42ba06178672791eeee2847e	2013-07-10 18:22:51 -07:00
Yaowu Xu	ba3b2604f0	Merge "Quantize (64-bit only, for now) SSSE3 SIMD."	2013-07-01 15:58:57 -07:00
Ronald S. Bultje	7353ceab9d	Quantize (64-bit only, for now) SSSE3 SIMD. Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904	2013-07-01 11:36:07 -07:00
Dmitry Kovalev	bb8ccf1caf	Moving encoder subexp encoding functions to subexp.{h, c}. Change-Id: I83ca53bf6def871f199a382a671f26ad7cbecbca	2013-06-29 11:50:45 -07:00
Ronald S. Bultje	54b2a59623	Implement SSE2 block_error. Change vp9_block_error() to return a 64bit error variable, change all callers to expect a 64bit return value (this will prevent overflows, which we basically don't check for at all right now). Remove duplicate block_error() function, which fixed that through truncation. Remove old (incompatible) mmx/sse2 block_error SIMD versions and replace with a new one that returns a 64bit value. Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to 3min23, i.e. a 3% overall speedup. Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68	2013-06-21 12:54:52 -07:00
Ronald S. Bultje	25c588b1e4	Add subtract_block SSE2 version and unit test. 3% faster overall (3min35.0 to 3min28.5). Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e	2013-06-21 09:35:37 -07:00
Ronald S. Bultje	8fb6c58191	Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d	2013-06-20 09:34:25 -07:00
Ronald S. Bultje	d9fc451666	Move subpixel variance function from common/ to encoder/. This seems to only be used in the encoder. Also remove an empty wrapper file that contained forward declarations for this function, but didn't actually define any actual functions. Change-Id: Ifc561eef7ebe374a7d03698055e51e105f6d614b	2013-06-17 16:54:09 -07:00
Dmitry Kovalev	18c83b3714	Compressed/uncompressed frame header changes. Adding API to read/write uncompressed frame header bits (it is not final yet). Separate functions to read/write uncompressed header. Moving clr_type, error_resilient_mode, refresh_frame_context, frame_parallel_decoding_mode, frame_context_idx from compressed partition to uncompressed frame header. Change-Id: Id3ed8a387980c652ae147549412f4ec24a0a5bd0	2013-05-28 18:07:54 -07:00
Dmitry Kovalev	1a24011469	Revert "Adding API to read/write uncompressed frame header bits." because of bitstream mismatches. This reverts commit `df037b615f` Change-Id: I1a529f2590df7bc912f5035d22311268933e3dd6	2013-05-28 02:24:52 -07:00
Dmitry Kovalev	df037b615f	Adding API to read/write uncompressed frame header bits. The API is not final yet and can be changed. Actual layout of uncompressed frame part will be finalized later. Right now moving clr_type, error_resilient_mode, refresh_frame_context, frame_parallel_decoding_mode from first compressed partition to uncompressed frame part. Change-Id: I3afc5d4ea92c5a114f4c3d88f96858cccc15b76e	2013-05-21 15:31:32 -07:00
Johann	a62fcbea30	Automatically flag intrinsic files Change-Id: Iee9894615265d42aa23c43a4183924953aedb0c6	2013-05-03 15:35:13 -07:00
Johann	e43662e8e6	Remove unused quantize optimizations. Files were copied from vp8 and never maintained. Change-Id: I9659a8755985da73e8c19c3c984423b6666d8871	2013-04-30 18:42:05 -07:00
Johann	32a5c52856	Merge branch 'master' into experimental Conflicts: vp9/common/vp9_findnearmv.c vp9/common/vp9_rtcd_defs.sh vp9/decoder/vp9_decodframe.c vp9/decoder/x86/vp9_dequantize_sse2.c vp9/encoder/vp9_rdopt.c vp9/vp9_common.mk Resolve file name changes in favor of master. Resolve rdopt changes in favor of experimental, preserving the newer experiments. Change-Id: If51ed8f457470281c7b20a5c1a2f4ce2cf76c20f	2013-04-26 12:57:10 -07:00
Johann	863601c589	Normalize more intrinsic filenames vp9_dequantize_x86 has only sse2 functions. vp9_dct_sse2_intrinsics has no namespace collision and can drop _intrinsics. vp9_idct_mmx.h is unused. Change-Id: Ic16e31fb372a1d1e841a62ecb4189fe8f95808ec	2013-04-25 23:26:20 -07:00
John Koleszar	15255eef82	Move dequant from BLOCKD to per-plane MACROBLOCKD This data can vary per-plane, but not per-block. Change-Id: I1971b0b2c2e697d2118e38b54ef446e52f63c65a	2013-04-25 11:57:20 -07:00
Christian Duvivier	5b6d33f9af	Faster vp9_short_fdct4x4 and vp9_short_fdct8x4. Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda	2013-04-16 16:38:30 -07:00
Christian Duvivier	f13b69d07c	Faster vp9_short_fdct4x4 and vp9_short_fdct8x4. Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda	2013-04-16 16:11:56 -07:00
Jim Bankoski	8f270acfb2	mv dct_sse2.c dct_sse2_intrinsics.c to avoid collision Change-Id: Id786be31da3c91d95d2955aa569ecdc6e66650df	2013-02-28 13:58:15 -08:00
Christian Duvivier	c129203f7e	Faster vp9_short_fdct8x8. Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d	2013-02-27 17:23:08 -08:00
John Koleszar	5ac141187a	Merge "Remove unused vp9_copy32xn" into experimental	2013-02-27 12:23:45 -08:00
Ronald S. Bultje	e8c74e2b70	Move eob from BLOCKD to MACROBLOCKD. Consistent with VP8. Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07	2013-02-27 11:00:55 -08:00
John Koleszar	7ad8dbe417	Remove unused vp9_copy32xn This function was part of an optimization used in VP8 that required caching two macroblocks. This is unused in VP9, and might not survive refactoring to support superblocks, so removing it for now. Change-Id: I744e585206ccc1ef9a402665c33863fc9fb46f0d	2013-02-27 10:24:56 -08:00
Ronald S. Bultje	46dff5d233	Remove some Y2-related code. Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78	2013-02-15 14:06:25 -08:00

1 2

59 Commits