generic-library/vpx

Author	SHA1	Message	Date
Jingning Han	0084e61d5f	Tune the rounding operations in 8x8 ADST/DCT sse2 Improve the round-trip precision to meet the unit test setttings. Change-Id: I303febae56b4b990ea3798b8ebed94c0510ecf79	2013-06-25 12:02:26 -07:00
Jingning Han	67365520e7	Merge "Use aligned buffer operations in 8x8/16x16 2D-DCT"	2013-06-25 09:49:03 -07:00
Yaowu Xu	b9c934df8e	Merge "Enable sse2 implmentation of 8x8 ADST/DCT"	2013-06-25 09:13:22 -07:00
Jingning Han	82d504b50f	Use aligned buffer operations in 8x8/16x16 2D-DCT This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles. Change-Id: I137758b81cd127b936175284310e81378db64552	2013-06-24 19:56:23 -07:00
Jingning Han	a32a086d23	Enable sse2 implmentation of 8x8 ADST/DCT This commit makes use of the butterfly structure to enable the sse2 version implementation of 8x8 ADST/DCT hybrid transform coding. The runtime of hybrid transform module goes down from 1170 cycles to 245 cycles. Overall speed-up around 1.5%. Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f	2013-06-24 18:41:33 -07:00
Ronald S. Bultje	fc033b38ee	Remove emms - that shouldn't be there. Change-Id: I8fcab81e390f93dc17e9666bbf8f77883b5aa897	2013-06-21 14:45:04 -07:00
Ronald S. Bultje	ba42c02654	Add missing SECTION .text marker in assembly file. Fixes a crash on Windows when building with MSVC. Change-Id: I124ac756a1be55d190fadda5fcc46d23b1445dbf	2013-06-21 12:55:46 -07:00
Ronald S. Bultje	54b2a59623	Implement SSE2 block_error. Change vp9_block_error() to return a 64bit error variable, change all callers to expect a 64bit return value (this will prevent overflows, which we basically don't check for at all right now). Remove duplicate block_error() function, which fixed that through truncation. Remove old (incompatible) mmx/sse2 block_error SIMD versions and replace with a new one that returns a 64bit value. Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to 3min23, i.e. a 3% overall speedup. Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68	2013-06-21 12:54:52 -07:00
Ronald S. Bultje	25c588b1e4	Add subtract_block SSE2 version and unit test. 3% faster overall (3min35.0 to 3min28.5). Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e	2013-06-21 09:35:37 -07:00
Ronald S. Bultje	1e6a32f1af	SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to 3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions which use a bilinear filter (x_offset & 7 \|\| y_offset & 7) aren't perfectly interleaved, and can probably be improved further in the future. I've marked this with a few TODOs/FIXMEs in the code. Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9	2013-06-20 15:59:48 -07:00
Ronald S. Bultje	8fb6c58191	Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d	2013-06-20 09:34:25 -07:00
Ronald S. Bultje	d9fc451666	Move subpixel variance function from common/ to encoder/. This seems to only be used in the encoder. Also remove an empty wrapper file that contained forward declarations for this function, but didn't actually define any actual functions. Change-Id: Ifc561eef7ebe374a7d03698055e51e105f6d614b	2013-06-17 16:54:09 -07:00
Jingning Han	0b7910b9ff	Merge "Enable sse2 version of sad8x4/4x8"	2013-06-14 13:15:49 -07:00
Jingning Han	15f50e7b42	Enable sse2 version of sad8x4/4x8 The encoding time for bus at CIF goes from 661s to 625s. This commit also enabled unit test of sad8x4/4x8 in sad_test.cc. Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1	2013-06-13 16:18:18 -07:00
Ronald S. Bultje	fa96eeb835	Implement SSE version for sad4x8x4d and SSE2 version for sad8x4x4d. Encoding time of crew (CIF, first 50 frames) @ 1500kbps goes from 4min56 to 4min42. Change-Id: I92c0c8b32980d2ae7c6dafc8b883a2c7fcd14a9f	2013-06-12 17:40:01 -04:00
John Koleszar	d0ed677a34	Merge branch 'master' into experimental Change-Id: Ie648398b82f7311143709f55c0e30ba452f50eff	2013-06-11 16:29:28 -07:00
Yunqing Wang	f4fcfe3075	Optimize variance functions Added SSE2 version of variance functions for super blocks. Change-Id: Ibeaae8771ca21c99d41dd74067574a51e97b412d	2013-05-22 10:29:38 -07:00
Johann	e43662e8e6	Remove unused quantize optimizations. Files were copied from vp8 and never maintained. Change-Id: I9659a8755985da73e8c19c3c984423b6666d8871	2013-04-30 18:42:05 -07:00
Johann	32a5c52856	Merge branch 'master' into experimental Conflicts: vp9/common/vp9_findnearmv.c vp9/common/vp9_rtcd_defs.sh vp9/decoder/vp9_decodframe.c vp9/decoder/x86/vp9_dequantize_sse2.c vp9/encoder/vp9_rdopt.c vp9/vp9_common.mk Resolve file name changes in favor of master. Resolve rdopt changes in favor of experimental, preserving the newer experiments. Change-Id: If51ed8f457470281c7b20a5c1a2f4ce2cf76c20f	2013-04-26 12:57:10 -07:00
Johann	e3038ca8b7	Whitespace nit Change-Id: I7486970c57cda75d26ec2c6d1f36bd668c955f66	2013-04-26 01:03:35 -07:00
Johann	863601c589	Normalize more intrinsic filenames vp9_dequantize_x86 has only sse2 functions. vp9_dct_sse2_intrinsics has no namespace collision and can drop _intrinsics. vp9_idct_mmx.h is unused. Change-Id: Ic16e31fb372a1d1e841a62ecb4189fe8f95808ec	2013-04-25 23:26:20 -07:00
John Koleszar	15255eef82	Move dequant from BLOCKD to per-plane MACROBLOCKD This data can vary per-plane, but not per-block. Change-Id: I1971b0b2c2e697d2118e38b54ef446e52f63c65a	2013-04-25 11:57:20 -07:00
John Koleszar	cbd1315ac4	Move src_diff to per-plane MACROBLOCK data First in a series of commits making certain MACROBLOCK members addressable per-plane. This commit also refactors the block subtraction functions vp9_subtract_b, vp9_subtract_sby_c, etc to be loops-over-planes and variable subsampling aware. Change-Id: I371d092b914ae0a495dfd852ea1a3d2467be6ec3	2013-04-23 12:18:51 -07:00
Jingning Han	6f43ff5824	Make the use of pred buffers consistent in MB/SB Use in-place buffers (dst of MACROBLOCKD) for macroblock prediction. This makes the macroblock buffer handling consistent with those of superblock. Remove predictor buffer MACROBLOCKD. Change-Id: Id1bcd898961097b1e6230c10f0130753a59fc6df	2013-04-18 14:59:36 -07:00
Ronald S. Bultje	0c481f4d18	Add SSE2 versions for rectangular sad and sad4d functions. About 11% overall encoder speedup with the sbsegment experiment enabled. Change-Id: Iffb1bdba6932d9f11a6c791cda8697ccf9327183	2013-04-17 10:31:59 -07:00
Christian Duvivier	5b6d33f9af	Faster vp9_short_fdct4x4 and vp9_short_fdct8x4. Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda	2013-04-16 16:38:30 -07:00
Christian Duvivier	f13b69d07c	Faster vp9_short_fdct4x4 and vp9_short_fdct8x4. Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda	2013-04-16 16:11:56 -07:00
Ronald S. Bultje	b4f6098ef7	Make RD superblock mode search size-agnostic. Merge various super_block_yrd and super_block_uvrd versions into one common function that works for all sizes. Make transform size selection size-agnostic also. This fixes a slight bug in the intra UV superblock code where it used the wrong transform size for txsz > 8x8, and stores the txsz selection for superblocks properly (instead of forgetting it). Lastly, it removes the trellis search that was done for 16x16 intra predictors, since trellis is relatively expensive and should thus only be done after RD mode selection. Gives basically identical results on derf (+0.009%). Change-Id: If4485c6f0a0fe4038b3172f7a238477c35a6f8d3	2013-04-10 16:50:30 -07:00
John Koleszar	4c05a051ab	Move qcoeff, dqcoeff from BLOCKD to per-plane data Start grouping data per-plane, as part of refactoring to support additional planes, and chroma planes with other-than 4:2:0 subsampling. Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23	2013-04-04 16:30:57 -07:00
Yunqing Wang	6344c84c82	Optimize 8x8 idct function Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8. Compared to c version, the sse2 version is 2X faster. The decoder test didn't show noticeable gain since 8x8 idct doesn't take much of decoding time (less than 1% in my test). Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3	2013-03-18 15:34:14 -07:00
Christian Duvivier	4418b790a7	Faster vp9_short_fdct16x16. Scalar path is about 1.5x faster (3.1% overall encoder speedup). SSE2 path is about 7.2x faster (7.8% overall encoder speedup). Change-Id: I06da5ad0cdae2488431eabf002b0d898d66d8289	2013-03-15 15:55:31 -07:00
John Koleszar	69c67c9531	Merge master branch into experimental Picks up some build system changes, compiler warning fixes, etc. Change-Id: I2712f99e653502818a101a72696ad54018152d4e	2013-03-01 11:06:05 -08:00
Jim Bankoski	078f5bf439	Merge "mv dct_sse2.c dct_sse2_intrinsics.c to avoid collision" into experimental	2013-02-28 15:16:44 -08:00
Jim Bankoski	8f270acfb2	mv dct_sse2.c dct_sse2_intrinsics.c to avoid collision Change-Id: Id786be31da3c91d95d2955aa569ecdc6e66650df	2013-02-28 13:58:15 -08:00
Jim Bankoski	714aa9f3c0	this commit converts all sad ptrs to uint32 sse4_1 code used uint16_t for returning sad, but that won't work for 32x32 or 64x64. This code fixes the assembly for those and also reenables sse4_1 on linux Change-Id: I5ce7288d581db870a148e5f7c5092826f59edd81	2013-02-28 08:46:35 -08:00
Christian Duvivier	c129203f7e	Faster vp9_short_fdct8x8. Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d	2013-02-27 17:23:08 -08:00
John Koleszar	7ad8dbe417	Remove unused vp9_copy32xn This function was part of an optimization used in VP8 that required caching two macroblocks. This is unused in VP9, and might not survive refactoring to support superblocks, so removing it for now. Change-Id: I744e585206ccc1ef9a402665c33863fc9fb46f0d	2013-02-27 10:24:56 -08:00
Jan Kratochvil	82ed3f9a41	Fix --as=nasm compatibility for new asm code. s/movd/movq/ Change-Id: Id1a56de91551f8dc796f14f1056c565dfc1ba626	2013-02-27 09:55:38 -08:00
Ronald S. Bultje	46dff5d233	Remove some Y2-related code. Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78	2013-02-15 14:06:25 -08:00
Ronald S. Bultje	c0ce2ab349	Port sadNxNx4d functions to x86inc.asm. Change-Id: Ic639f5742f7a007753d7a3fa5c66235172eb31d8	2013-02-08 17:59:32 -08:00
Ronald S. Bultje	02ff360b33	Add sad64x64 and sad32x32 SSE2 versions. Also port the 4x4, 16x16, 8x16 and 16x8 versions to x86inc.asm; this makes them all slightly faster, particularly on x86-64. Remove SSE3 sad16x16 version, since the SSE2 version is now faster. About 1.5% overall encoding speedup. Change-Id: Id4011a78cce7839f554b301d0800d5ca021af797	2013-02-08 16:32:25 -08:00
Ronald S. Bultje	a788e0fe63	Add sse2 versions of sub_pixel_variance{32x32,64x64}. 7.5% faster overall encoding. Change-Id: Ie9bb7f9fdf93659eda106404cb342525df1ba02f	2013-02-06 11:20:59 -08:00
Ronald S. Bultje	58c983d109	Add SSE3 versions for sad{32x32,64x64}x4d functions. Overall encoding about 15% faster. Change-Id: I176a775c704317509e32eee83739721804120ff2	2013-02-05 15:21:47 -08:00
Frank Galligan	f67d740b34	Add support for x64 and win64 yasm flags. Some projects must define only win64 for Windows 64bit builds using yasm. Change-Id: I1d09590d66a7bfc8b4412e1cc8685978ac60b748	2013-01-31 16:25:37 -08:00
Yaowu Xu	9bf73f46f9	fix a number issues that cause failures During master jenkins verification proces Change-Id: I3722b8753eaf39f99b45979ce407a8ea0bea0b89	2013-01-14 18:32:32 -08:00
John Koleszar	5ebe94f9f1	Build fixes to merge vp9-preview into master Various fixups to resolve issues when building vp9-preview under the more stringent checks placed on the experimental branch. Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07	2012-12-26 11:21:09 -08:00
Jim Bankoski	1dffce7f96	add private to assembly files to insure proper chromebuild Change-Id: I6e43ca73f35401a974ed8ee27738d4318f09fd37	2012-12-20 09:40:18 -08:00
John Koleszar	4a4d2aa55c	vp9_bilinear_filters_mmx: add missing extern specifiers Change-Id: Ibabf18947f90cb4f45052763ebf44cfb8209bd8b	2012-12-05 08:27:48 -08:00
Jim Bankoski	b95338c7ab	Merge "fixes --disable-vp9-encoder" into vp9-preview	2012-12-03 12:41:31 -08:00
Jim Bankoski	d9038b3c60	fixes --disable-vp9-encoder Change-Id: I467bf0fdf3b35326bcce58d5459e6d2dbfd6c5e5	2012-12-03 12:21:16 -08:00
Jim Bankoski	2b8dc065d1	google style guide include guards Change-Id: I2c252f3ddcc99e96c1f5d3dab8bcb25a2a3637ea	2012-11-30 07:30:59 -08:00
Jim Bankoski	d0a20fd22c	last remaining warning Change-Id: I1f49d96cdb5e342041c9a72ef31df361a1b609eb	2012-11-29 14:07:21 -08:00
Jim Bankoski	85cba19e16	remove postproc invokes and some miscellaneous invoke left overs Change-Id: I63191b1bfd3bea4ce30cceaeb686ec850570fc43	2012-11-28 10:00:25 -08:00
John Koleszar	fcccbcbb39	Add vp9_ prefix to all vp9 files Support for gyp which doesn't support multiple objects in the same static library having the same basename. Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc	2012-11-27 14:12:30 -08:00
Jim Bankoski	f4871b6a3f	clean out some of the rtcd code. This removes functions that are no longer needed and cleans up some warnings. Change-Id: I292a4c3694e9c1d68ce99cea390905b198434719	2012-11-18 12:33:18 -08:00
Jim Bankoski	cb98b83239	removal of temporal invoke Change-Id: I18ca713b02a5241bdb20dddcde0216467b55b596	2012-11-17 06:11:01 -08:00
John Koleszar	8959c8b11d	Merge with upstream experimental changes (2) Include upstream changes (variance fixes) into the merged code base. Change-Id: I4182654c1411c1b15cd23235d3822702613abce1	2012-11-07 14:32:26 -08:00
John Koleszar	2c08c28191	Merge with upstream experimental changes Include upstream changes (unit test fixes, in particular) into the merged code base. Change-Id: I096f8a9d09e2532fbec0c95d7a995ab22fa54b29	2012-11-07 11:46:23 -08:00
John Koleszar	7b8dfcb5a2	Rough merge of master into experimental Creates a merge between the master and experimental branches. Fixes a number of conflicts in the build system to allow either VP8 or VP9 to be built. Specifically either: $ configure --disable-vp9 $ configure --disable-vp8 --disable-unit-tests VP9 still exports its symbols and files as VP8, so that will be resolved in the next commit. Unit tests are broken in VP9, but this isn't a new issue. They are fixed upstream on origin/experimental as of this writing, but rebasing this merge proved difficult, so will tackle that in a second merge commit. Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21	2012-11-07 11:30:16 -08:00
James Zern	984734436d	Fix variance (signed integer) overflow In the variance calculations the difference is summed and later squared. When the sum exceeds sqrt(2^31) the value is treated as a negative when it is shifted which gives incorrect results. To fix this we force the multiplication to be unsigned. The alternative fix is to shift sum down by 4 before multiplying. However that will reduce precision. For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and change). This change is based on: `1698234` Missed some variance casts `fea3556` Fix variance overflow Change-Id: I2c61856cca9db54b9b81de83b4505ea81a050a0f	2012-11-06 23:06:44 -08:00
Jim Bankoski	7849aa20ed	remove invoke_search macro Removed invoke search from encoder Change-Id: I3d809b795abe6df0e71366edfe94026aaede14fb	2012-11-05 16:58:03 -08:00
Ronald S. Bultje	4b2c2b9aa4	Rename vp8/ codec directory to vp9/. Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4	2012-11-01 16:31:22 -07:00

1 2 3

112 Commits