generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	466f395148	Merge "Removing extra params from x_add_residual() functions" into experimental	2013-04-16 08:58:28 -07:00
Scott LaVarnway	6f95d53e37	Removing extra params from x_add_residual() functions Now that the predictor is the dest, we do not need the extra parameters. Change-Id: I31e2c3d2015f4a1cd12e7f04536d8db478582a0a	2013-04-16 09:59:01 -04:00
Scott LaVarnway	5393379c84	Merge "Removing extra params in dequant functions" into experimental	2013-04-16 06:37:00 -07:00
Jingning Han	aaf33d7df5	Add rectangular block size variance/sad functions. With this, the RD loop properly supports rectangular blocks. Change-Id: Iece79048fb4e84741ee1ada982da129a7bf00470	2013-04-15 13:39:07 -07:00
Scott LaVarnway	74610b1ae4	Removing extra params in dequant functions Now that the predictor is the dest, we do not need the extra parameters. Change-Id: I78db73d39b5aff62f15303f3d51ad2797eae74b6	2013-04-15 13:43:11 -04:00
Jingning Han	815e95fbeb	Make intra predictor support rectangular blocks The intra predictor supports configurable block sizes. It can handle intra prediction down to 4x4 sizes, when enabled in BLOCK_SIZE_TYPE. Change-Id: I7399ec2512393aa98aadda9813ca0c83e19af854	2013-04-11 16:45:57 -07:00
John Koleszar	2f19cd03aa	Merge "Remove unused vp9_recon_mb{y,uv}_s" into experimental	2013-04-11 15:51:20 -07:00
John Koleszar	c382ed09f8	Remove unused vp9_recon_mb{y,uv}_s These functions now are handled through the common superblock code. Change-Id: Ib6688971bae297896dcec42fae1d3c79af7a611c	2013-04-11 14:05:59 -07:00
Scott LaVarnway	6189f2bcb1	WIP: removing predictor buffer usage from decoder This patch will use the dest buffer instead of the predictor buffer. This will allow us in future commits to remove the extra mem copy that occurs in the dequant functions when eob == 0. We should also be able to remove extra params that are passed into the dequant functions. Change-Id: I7241bc1ab797a430418b1f3a95b5476db7455f6a	2013-04-11 13:55:18 -07:00
Ronald S. Bultje	b4f6098ef7	Make RD superblock mode search size-agnostic. Merge various super_block_yrd and super_block_uvrd versions into one common function that works for all sizes. Make transform size selection size-agnostic also. This fixes a slight bug in the intra UV superblock code where it used the wrong transform size for txsz > 8x8, and stores the txsz selection for superblocks properly (instead of forgetting it). Lastly, it removes the trellis search that was done for 16x16 intra predictors, since trellis is relatively expensive and should thus only be done after RD mode selection. Gives basically identical results on derf (+0.009%). Change-Id: If4485c6f0a0fe4038b3172f7a238477c35a6f8d3	2013-04-10 16:50:30 -07:00
Ronald S. Bultje	a3874850dd	Make SB coding size-independent. Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code gives identical encoder results before and after. There are a few macros for rectangular block sizes under the sbsegment experiment; this experiment is not yet functional and should not yet be used. Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728	2013-04-09 21:28:27 -07:00
John Koleszar	4c05a051ab	Move qcoeff, dqcoeff from BLOCKD to per-plane data Start grouping data per-plane, as part of refactoring to support additional planes, and chroma planes with other-than 4:2:0 subsampling. Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23	2013-04-04 16:30:57 -07:00
Yunqing Wang	0e91bec4b5	Merge "Optimize 32x32 idct function" into experimental	2013-03-27 11:30:48 -07:00
Yunqing Wang	21a718d9a7	Optimize 32x32 idct function Wrote sse2 version of vp9_short_idct_32x32 function. Compared to c version, the sse2 version is 5X faster. Change-Id: I071ab7378358346ab4d9c6e2980f713c3c209864	2013-03-27 11:05:42 -07:00
Deb Mukherjee	23144d2345	Implicit weighted prediction experiment Adds an experiment to use a weighted prediction of two INTER predictors, where the weight is one of (1/4, 3/4), (3/8, 5/8), (1/2, 1/2), (5/8, 3/8) or (3/4, 1/4), and is chosen implicitly based on consistency of the predictors to the already reconstructed pixels to the top and left of the current macroblock or superblock. Currently the weighting is not applied to SPLITMV modes, which default to the usual (1/2, 1/2) weighting. However the code is in place controlled by a macro. The same weighting is used for Y and UV components, where the weight is derived from analyzing the Y component only. Results (over compound inter-intra experiment) derf: +0.18% yt: +0.34% hd: +0.49% stdhd: +0.23% The experiment suggests bigger benefit for explicitly signaled weights. Change-Id: I5438539ff4485c5752874cd1eb078ff14bf5235a	2013-03-26 16:58:56 -07:00
Yunqing Wang	869d6c0534	Optimize 16x16 idct10 function Wrote sse2 version of vp9_short_idct10_16x16 function. Compared to c version, the sse2 version is 2.3X faster. Change-Id: I314c4f09369648721798321eeed6f58e38857f26	2013-03-21 16:36:01 -07:00
Yunqing Wang	ec3100661c	Optimize 16x16 idct function Wrote sse2 version of vp9_short_idct16x16 function. Compared to c version, the sse2 version is over 2.5X faster. Change-Id: I38536e2b846427a2cc5c5423aaf305fd0e605d61	2013-03-21 11:44:05 -07:00
Yunqing Wang	6344c84c82	Optimize 8x8 idct function Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8. Compared to c version, the sse2 version is 2X faster. The decoder test didn't show noticeable gain since 8x8 idct doesn't take much of decoding time (less than 1% in my test). Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3	2013-03-18 15:34:14 -07:00
Yaowu Xu	12ade55719	Merge "removed reference to "LLM" and "x8"" into experimental	2013-03-18 08:51:19 -07:00
Christian Duvivier	4418b790a7	Faster vp9_short_fdct16x16. Scalar path is about 1.5x faster (3.1% overall encoder speedup). SSE2 path is about 7.2x faster (7.8% overall encoder speedup). Change-Id: I06da5ad0cdae2488431eabf002b0d898d66d8289	2013-03-15 15:55:31 -07:00
Yaowu Xu	005552639b	removed reference to "LLM" and "x8" The commit changed the name of files and function to remove obselete reference to LLM and x8. Change-Id: I973b20fc1a55149ed68b5408b3874768e6f88516	2013-03-13 08:35:46 -07:00
Yunqing Wang	11ca81f8b6	Add vp9_idct4_1d_sse2 Added SSE2 idct4_1d which is called by vp9_short_iht4x4. Also, modified the parameter type passed to vp9_short_iht functions to make it work with rtcd prototype. Change-Id: I81ba7cb4db6738f1923383b52a06deb760923ffe	2013-03-08 15:04:22 -08:00
Yunqing Wang	f240782650	Optimize add_constant_residual function Optimized adding constant diff to predictor, which gave about 2% decoder performance gain. Change-Id: I47db20c31428e8c4a8f16214a85cbe386a6e9303	2013-03-07 15:49:07 -08:00
Yunqing Wang	f4e383f3d1	Merge "Optimize add_residual function" into experimental	2013-03-05 16:47:58 -08:00
Yunqing Wang	943c6d7172	Optimize add_residual function Optimized adding diff to predictor, which gave 0.8% decoder performance gain. Change-Id: Ic920f0baa8cbd13a73fa77b7f9da83b58749f0f8	2013-03-05 16:27:45 -08:00
Ronald S. Bultje	4209bba462	Merge changes Ifacbf5a0,Ibad7c3dd into experimental * changes: vpxenc: actually report mismatch on stderr. Make superblocks independent of macroblock code and data.	2013-03-05 11:17:14 -08:00
Ronald S. Bultje	111ca42133	Make superblocks independent of macroblock code and data. Split macroblock and superblock tokenization and detokenization functions and coefficient-related data structs so that the bitstream layout and related code of superblock coefficients looks less like it's a hack to fit macroblocks in superblocks. In addition, unify chroma transform size selection from luma transform size (i.e. always use the same size, as long as it fits the predictor); in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma transform will now use the 16x16 (instead of the 8x8) chroma transform, and 64x64 superblocks using the 32x32 luma transform will now use the 32x32 (instead of the 16x16) chroma transform. Lastly, add a trellis optimize function for 32x32 transform blocks. HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's a few negative points here and there that I might want to analyze a little closer. Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430	2013-03-04 16:34:36 -08:00
Yunqing Wang	37932d9168	Merge "Optimize vp9_short_idct4x4llm function" into experimental	2013-03-04 14:13:31 -08:00
Yunqing Wang	e8bc9f4220	Optimize vp9_short_idct4x4llm function Wrote a SSE2 vp9_short_idct4x4llm to improve the decoder performance. Change-Id: I90b9d48c4bf37aaf47995bffe7e584e6d4a2c000	2013-03-04 12:01:27 -08:00
John Koleszar	1cfc86ebe0	Add unit test for x4 multi-SAD functions Update the function prototypes to match between VP9 and VP8. Change-Id: If58965073989e87df3b62b67a030ec6ce23ca04f	2013-03-01 18:14:02 -08:00
John Koleszar	69c67c9531	Merge master branch into experimental Picks up some build system changes, compiler warning fixes, etc. Change-Id: I2712f99e653502818a101a72696ad54018152d4e	2013-03-01 11:06:05 -08:00
Yunqing Wang	c550bb3b09	Add eob<=10 case in idct32x32 Simplified idct32x32 calculation when there are only 10 or less non-zero coefficients in 32x32 block. This helps the decoder performance. Change-Id: If7f8893d27b64a9892b4b2621a37fdf4ac0c2a6d	2013-02-28 16:40:29 -08:00
Yunqing Wang	72b146690a	Merge "Refactor vp9_dequant_idct_add function" into experimental	2013-02-28 14:34:27 -08:00
Yunqing Wang	6193bc3ba8	Refactor vp9_dequant_idct_add function Provided a wrapper and removed duplicate code. Change-Id: Iaef842226ec348422e459202793b001d0983ea30	2013-02-28 14:18:46 -08:00
Scott LaVarnway	aa8fb070b8	Removed vp9_dequantize_b Change-Id: Ie89bd00d58e30bf4094cb748a282f1dfa81a31d8	2013-02-28 14:08:12 -08:00
Jim Bankoski	714aa9f3c0	this commit converts all sad ptrs to uint32 sse4_1 code used uint16_t for returning sad, but that won't work for 32x32 or 64x64. This code fixes the assembly for those and also reenables sse4_1 on linux Change-Id: I5ce7288d581db870a148e5f7c5092826f59edd81	2013-02-28 08:46:35 -08:00
Christian Duvivier	c129203f7e	Faster vp9_short_fdct8x8. Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d	2013-02-27 17:23:08 -08:00
John Koleszar	5ac141187a	Merge "Remove unused vp9_copy32xn" into experimental	2013-02-27 12:23:45 -08:00
John Koleszar	7ad8dbe417	Remove unused vp9_copy32xn This function was part of an optimization used in VP8 that required caching two macroblocks. This is unused in VP9, and might not survive refactoring to support superblocks, so removing it for now. Change-Id: I744e585206ccc1ef9a402665c33863fc9fb46f0d	2013-02-27 10:24:56 -08:00
Yunqing Wang	35bc02c6eb	Optimize vp9_dc_only_idct_add_c function Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to improve performance, clipped the absolute diff values to [0, 255]. This allowed us to keep the additions/subtractions in 8 bits. Test showed an over 2% decoder performance increase. Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d	2013-02-26 17:16:13 -08:00
Jingning Han	77a3becf92	clean up forward and inverse hybrid transform Rebased. Remove the old matrix multiplication transform computation. The 16x16 ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16 300/0 in vp9/common/vp9_blockd.h. Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f	2013-02-25 09:16:12 -08:00
James Zern	e5fb6321a1	give vp9 variance struct a unique name variance_vtable clashed with vp8/common/variance.h Change-Id: I09c1de44d5519f1bd13f58c01144c0de4706de6f	2013-02-22 16:25:13 -08:00
Jingning Han	babbd5d170	Forward butterfly hybrid transform This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT hybrid transform. The kernel of 4x4 ADST is sin((2k+1)(n+1)/(2N+1)). The kernel of 8x8/16x16 ADST is of the form sin((2k+1)(2n+1)/4N). Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1	2013-02-21 18:24:28 -08:00
Ronald S. Bultje	35524e2231	Remove "eobs" array in MACROBLOCKD. The information is a duplicate of "eob" in BLOCKD. Change-Id: Ia6416273bd004611da801e4bfa6e2d328d6f02a3	2013-02-21 10:07:36 -08:00
Yaowu Xu	d262e26cc7	Merge lossless experiment Change-Id: I7b7b8d4fda3a23699e0c920d727f8c15d37d43aa	2013-02-20 07:54:28 -08:00
Jingning Han	cd907b1601	16x16 butterfly inverse ADST/DCT hybrid transform rebased. This patch includes 16x16 butterfly inverse ADST/DCT hybrid transform. It uses the variant ADST of kernel sin((2k+1)*(2n+1)/4N), which allows a butterfly implementation. The coding gains as compared to DCT 16x16 are about 0.1% for both derf and std-hd. It is noteworthy that for std-hd sets many sequences gains about 0.5%, some 0.2%. There are also few points that provides -1% to -3% performance. Hence the average goes to about 0.1%. Change-Id: Ie80ac84cf403390f6e5d282caa58723739e5ec17	2013-02-19 09:07:00 -08:00
Ronald S. Bultje	46dff5d233	Remove some Y2-related code. Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78	2013-02-15 14:06:25 -08:00
Scott LaVarnway	7755657ea7	Merge "WIP: ssse3 version of convolve avg functions" into experimental	2013-02-15 07:54:21 -08:00
Yaowu Xu	d3de97794f	Merge "fix the lossless experiment" into experimental	2013-02-13 09:54:35 -08:00
Yaowu Xu	16f25f9dc8	fix the lossless experiment Change-Id: I95acfc1417634b52d344586ab97f0abaa9a4b256	2013-02-13 09:20:26 -08:00

1 2 3 4 5

245 Commits