generic-library/vpx

Author	SHA1	Message	Date
Jingning Han	a32a086d23	Enable sse2 implmentation of 8x8 ADST/DCT This commit makes use of the butterfly structure to enable the sse2 version implementation of 8x8 ADST/DCT hybrid transform coding. The runtime of hybrid transform module goes down from 1170 cycles to 245 cycles. Overall speed-up around 1.5%. Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f	2013-06-24 18:41:33 -07:00
Yaowu Xu	869d770610	Merge "Get some speed back for cpuused 1"	2013-06-20 22:37:01 -07:00
Yaowu Xu	45e25a7814	Get some speed back for cpuused 1 and remove unused code. Change-Id: If380440c4450294b5450b7a9eeb94a376846ec01	2013-06-20 19:05:18 -07:00
Yaowu Xu	61721181ec	Merge "rename variables to avoid build error in MSVC"	2013-06-20 19:04:30 -07:00
Yaowu Xu	ee07a261a0	rename variables to avoid build error in MSVC Change-Id: I7960178c95c54d5c4497e44cfc8c493566294b34	2013-06-20 18:31:48 -07:00
Yaowu Xu	e6cd5ed307	Merge "Implement sse2 and ssse3 versions for all sub_pixel_variance sizes."	2013-06-20 17:42:50 -07:00
Dmitry Kovalev	8283d893eb	Merge "Renaming 'nmv' to 'mv' for several functions."	2013-06-20 10:17:12 -07:00
Deb Mukherjee	7947a33d72	Improving model rd with variance and quant step Improves the rd modeling function and implements them using interpolation from a table which is a little faster. Also uses sse as input to the modeling function rather than var - since there is no dc prediction used and as a result the sse works a little better. derfraw300: +0.05% Speedup: ~1% Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff	2013-06-20 10:06:28 -07:00
Jim Bankoski	9f2a1ae23e	adds force partitioning greater than or less than block size adds a new speed feature to force partitioning to be greater than or less than a certain size Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0	2013-06-20 09:51:42 -07:00
Jim Bankoski	18bdf708e7	adds a set partitioning to speed features this feature lets you set a partitioning size to be used by the entire frame. Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06	2013-06-20 09:50:44 -07:00
Jim Bankoski	476d73d294	partition by variance using var from last frame This uses variance to split partition. Variance is calculated using nearest mv, always from last ref frame. Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896	2013-06-20 09:48:22 -07:00
Jim Bankoski	1f94b97694	convert all speed things to speed features Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a	2013-06-20 09:42:44 -07:00
Jim Bankoski	727fa7b1e4	new partition via variance Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427	2013-06-20 09:42:05 -07:00
Jim Bankoski	0fad6a9d99	fix to set up new speed feature This uses the speed feature functionality for code. Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8	2013-06-20 09:35:02 -07:00
Jim Bankoski	df2314cfdd	don't copy partitions for key frames or altrefs force us to go through slow partitioning for keyframes, altref and overlays. Change-Id: I1a286361bf74083e71973575a7296be46eb98742	2013-06-20 09:34:32 -07:00
Ronald S. Bultje	8fb6c58191	Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d	2013-06-20 09:34:25 -07:00
Jim Bankoski	f954490bbf	disable speed > 1 speed corrections in firstpass need to rework these Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856	2013-06-20 09:34:03 -07:00
Jim Bankoski	fbcce4dd6f	Merge "copy partitioning from last fame"	2013-06-20 09:32:43 -07:00
Jim Bankoski	f033b44e74	copy partitioning from last fame Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a	2013-06-20 09:32:19 -07:00
Yunqing Wang	3656835771	Merge "Add two-pass quantization"	2013-06-19 11:35:40 -07:00
Yunqing Wang	b5bf7b13a8	Add two-pass quantization Optimized the quantization function by making it a two-pass process. The first pass does a quick checking of the transform coefficients against the base ZBIN, and only keep the good enough set of coefficients for quantization. A skipping check is added. If all coefficients are within the base ZBIN, no quantization is needed. The second pass is the actual quantization pass, which only processes the coefficient subset determined in first pass. This reduces the computation. Furthermore, an alternitive method is used for large transform size, which often has sparse nonzero quantized coefficients. Overall, the encoder speedup is about 4%. The quantization function itself gets 20% faster. Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22	2013-06-19 10:35:02 -07:00
Yaowu Xu	12180c8329	Remove unnecessary copying of probs. Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c	2013-06-18 23:02:27 -07:00
Dmitry Kovalev	87e1fa7627	Renaming 'nmv' to 'mv' for several functions. Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09	2013-06-18 18:28:10 -07:00
Jingning Han	7088426976	Merge "Make fdct32 computation flow within 16bit range"	2013-06-18 11:40:14 -07:00
Dmitry Kovalev	dfc0385291	Merge "Removing vp9_invtrans.{c, h} files."	2013-06-18 10:16:25 -07:00
Jingning Han	a41a4860c0	Make fdct32 computation flow within 16bit range This commit makes use of dual fdct32x32 versions for rate-distortion optimization loop and encoding process, respectively. The one for rd loop requires only 16 bits precision for intermediate steps. The original fdct32x32 that allows higher intermediate precision (18 bits) was retained for the encoding process only. This allows speed-up for fdct32x32 in the rd loop. No performance loss observed. Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3	2013-06-18 09:46:24 -07:00
Ronald S. Bultje	d9fc451666	Move subpixel variance function from common/ to encoder/. This seems to only be used in the encoder. Also remove an empty wrapper file that contained forward declarations for this function, but didn't actually define any actual functions. Change-Id: Ifc561eef7ebe374a7d03698055e51e105f6d614b	2013-06-17 16:54:09 -07:00
Dmitry Kovalev	686b99741c	Removing vp9_invtrans.{c, h} files. Moving single function from vp9_invtrans.c to vp9_encodemb.c. Change-Id: I26bf6bb90de342a3036c0dbfba78a7dd75a61fe7	2013-06-17 16:09:03 -07:00
Ronald S. Bultje	a2f33e2505	Use assembly-optimized variance functions in sub_pixel_{avg}_var(). 2.5% faster when encoding first 50 frames of bus @ 1500kbps. Change-Id: I5a64703996cf7fd39b07e32c72311c4b125ec6d4	2013-06-17 14:57:13 -07:00
Ronald S. Bultje	53729c7786	Fix typo ('weight' instead of 'width'). Change-Id: I5d3944051d091b4bf3eb13e2a30132d34203ef74	2013-06-17 13:56:24 -07:00
John Koleszar	c2da365484	Merge "Remove constant vp9_coef_update_prob table"	2013-06-14 17:07:19 -07:00
John Koleszar	0f7a66e962	Remove constant vp9_coef_update_prob table All elements of this table are equal to 252, so replace it with a single constant VP9_COEF_UPDATE_PROB. Change-Id: I1e2d1d284326ce6df9899a740c2fc344b3ec81c9	2013-06-14 15:12:31 -07:00
Jingning Han	0b7910b9ff	Merge "Enable sse2 version of sad8x4/4x8"	2013-06-14 13:15:49 -07:00
Jingning Han	c43af9a8a3	Enable sse2 version of sad8x4/4x8 The encoding time for bus at CIF goes from 661s to 625s. This commit also enabled unit test of sad8x4/4x8 in sad_test.cc. Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1	2013-06-14 09:19:28 -07:00
Deb Mukherjee	4ad96115cd	Some cleanups in rd motion search No bitstream or output change - only cosmetics. Change-Id: Ic8c1d7ad010a87dcf27d12a38cd7dd5adba683a7	2013-06-13 17:25:23 -07:00
Jingning Han	15f50e7b42	Enable sse2 version of sad8x4/4x8 The encoding time for bus at CIF goes from 661s to 625s. This commit also enabled unit test of sad8x4/4x8 in sad_test.cc. Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1	2013-06-13 16:18:18 -07:00
Ronald S. Bultje	fa96eeb835	Implement SSE version for sad4x8x4d and SSE2 version for sad8x4x4d. Encoding time of crew (CIF, first 50 frames) @ 1500kbps goes from 4min56 to 4min42. Change-Id: I92c0c8b32980d2ae7c6dafc8b883a2c7fcd14a9f	2013-06-12 17:40:01 -04:00
Ronald S. Bultje	b55f8b696a	Merge "Fix row tiling."	2013-06-12 12:41:57 -07:00
John Koleszar	ad3b12f857	Merge "Fix chroma output when scaling"	2013-06-12 12:39:10 -07:00
Ronald S. Bultje	8a0808a145	Fix row tiling. Change-Id: I57be4eeaea6e4402f6a0cc04f5c6b7a5d9aedf9b	2013-06-12 13:42:59 -04:00
John Koleszar	01016ff9a6	Fix chroma output when scaling The encode-side scaling was not indexing through the image correctly for the chroma planes, causing a green checkerboard-like output in the unit test. Change-Id: I9abbd73615404cd6699588be3e64dcf59005bc14	2013-06-12 10:11:53 -07:00
John Koleszar	d0ed677a34	Merge branch 'master' into experimental Change-Id: Ie648398b82f7311143709f55c0e30ba452f50eff	2013-06-11 16:29:28 -07:00
Deb Mukherjee	e3d3ace314	Merge "Minor change in forward updates" into experimental	2013-06-11 12:48:41 -07:00
Deb Mukherjee	a4d906c132	Minor change in forward updates Removes the case of coding prob = 0 for forward updates, since that is not an allowed probability to code. Slightly improves efficiency but may not matter in practice. Change-Id: I3b4caf82e8f0891992f0706d4089cc5a27568dba	2013-06-11 10:33:07 -07:00
Jim Bankoski	fca6c82b29	Fix rd partition search for corner blocks This commit enables proper partition type search for the bottom- right corner blocks. Change-Id: Id1123d0e4e81eba648ed4f3c0c7ab587e174f650	2013-06-11 09:29:21 -07:00
Deb Mukherjee	f18328cbf1	Adds a zero check in model_rd function Avoids divide-by-zero when variance is 0. Change-Id: I3c7f526979046ff7d17714ce960fe81d6e1442a0	2013-06-10 17:04:47 -07:00
John Koleszar	9b78ed8229	Merge "Using network byte order (big-endian) to encode tile size." into experimental	2013-06-10 16:48:11 -07:00
Deb Mukherjee	51a7c7631d	Merge "New probs for filters/tx_size and a few others" into experimental	2013-06-10 16:39:43 -07:00
Deb Mukherjee	a43ff15399	New probs for filters/tx_size and a few others * New probs for subpel filters/tx_count * Makes a change to not reset to defaults for the tx_size probs if an intermediate frame reverts to using a fixed tx_size. * A few updates to the parameters for backward adaptation for mode/mv * some cosmetic cleanups derf300: +0.06% Change-Id: I22994d659bc31ca7a4fc8820fde24001e64a2920	2013-06-10 16:38:47 -07:00
Dmitry Kovalev	85381e3416	Using network byte order (big-endian) to encode tile size. This is consistent with uncompressed header encoding. Change-Id: Iccf40a44b493ed36ee085b81ed56f7952cde70a9	2013-06-10 16:13:08 -07:00

1 2 3 4 5 ...

1055 Commits