generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	a272ff25cd	WIP: 16x16 idct/recon merge This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: Iea7976b22b1927d24b8004d2a3fddae7ecca3ba1	2013-05-15 13:16:02 -04:00
Scott LaVarnway	2cf0d4be12	WIP: 32x32 idct/recon merge This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: I4ea09df0e162591e420d869b7431c2e7f89a8c1a	2013-05-14 15:54:17 -07:00
Dmitry Kovalev	effaa3263d	Removing unused simple loopfilter code. Change-Id: Ic11dc052fb641687c015e1bbc37181b9babcd43e	2013-05-10 11:04:43 -07:00
Johann	32a5c52856	Merge branch 'master' into experimental Conflicts: vp9/common/vp9_findnearmv.c vp9/common/vp9_rtcd_defs.sh vp9/decoder/vp9_decodframe.c vp9/decoder/x86/vp9_dequantize_sse2.c vp9/encoder/vp9_rdopt.c vp9/vp9_common.mk Resolve file name changes in favor of master. Resolve rdopt changes in favor of experimental, preserving the newer experiments. Change-Id: If51ed8f457470281c7b20a5c1a2f4ce2cf76c20f	2013-04-26 12:57:10 -07:00
Johann	c5b127afea	Rename vp9_idct_x86.c Remove similarly named header file. It is obsolete. Move file to match naming style. Adjust make file to include the file correctly and remove extra unnecessary #if guard. Change-Id: Ifba07ba9938a5df08a9f4eda54a3ac4d6983f7bf	2013-04-25 11:13:02 -07:00
John Koleszar	d12376aa2c	Move dst to per-plane MACROBLOCKD data First in a series of commits moving the framebuffers pointers to per-plane data, so that they can be indexed numerically rather than by name. Change-Id: I6e0d60fd4d51e6375c384eb7321776564df21775	2013-04-19 16:16:10 -07:00
John Koleszar	5b8a7d6e25	Use SSSE3 for 2d filters larger than 16 The C code was being used as a fallback for the >16 case, but only for 2D. Change-Id: I1e2e6da9e4b28bd88bde9ba4dd32724ce466cf6f	2013-04-19 09:51:16 -07:00
Jingning Han	f0b065e946	Merge "Make the use of pred buffers consistent in MB/SB" into experimental	2013-04-18 15:24:55 -07:00
Jingning Han	6f43ff5824	Make the use of pred buffers consistent in MB/SB Use in-place buffers (dst of MACROBLOCKD) for macroblock prediction. This makes the macroblock buffer handling consistent with those of superblock. Remove predictor buffer MACROBLOCKD. Change-Id: Id1bcd898961097b1e6230c10f0130753a59fc6df	2013-04-18 14:59:36 -07:00
John Koleszar	a9ebbcc338	convolve: support larger blocks, fix asm saturation bug Updates the common convoloution code to support blocks larger than 16x16, and rectangular blocks. This uncovered a bug in the SSSE3 filtering routines due to the order of application of saturation. This commit fixes that bug, adjusts the unit test to bias its random values towards the extremes, and adds a test to ensure that all filters conform to the expected pairwise addition structure. Change-Id: I81f69668b1de0de5a8ed43f0643845641525c8f0	2013-04-18 13:57:59 -07:00
Yaowu Xu	421ad3f1b1	clean out experiments that are related to using reconstructed pixel for selecting reference motion vectors. Change-Id: I048dfae39ca7385e344b57d46347ecc6e753e1bb	2013-04-17 11:00:46 -07:00
John Koleszar	7f7d1357a2	Merge branch 'experimental' into master VP9 preview bitstream 2, commit '868ecb55a1528ca3f19286e7d1551572bf89b642' Conflicts: vp9/vp9_common.mk Change-Id: I3f0f6e692c987ff24f98ceafbb86cb9cf64ad8d3	2013-04-16 06:49:46 -07:00
John Koleszar	42db454c7f	Merge branch 'master' into experimental Conflicts: vp9/vp9_common.mk Change-Id: I2cd5ab47dc31c4210cefc23a282102123d5e2221	2013-04-02 14:54:44 -07:00
Johann	3db60c8c6c	Demux vp9_loopfilter_x86.c Allow more careful targeting of compiler flags. Change-Id: I963ab4a6479dedb165419310dfca52a58a9877b8	2013-04-02 12:49:04 -07:00
Johann	6c147b9d93	vp9_sadmxn_x86 only contains SSE2 functions Rename the file and clean up includes. In the future we would like to pattern match the files which need additional compiler flags. Change-Id: I2c76256467f392a78dd4ccc71e6e0a580e158e56	2013-04-02 11:20:55 -07:00
Yunqing Wang	c6c0657c60	Modify idct code to use macro Small modification of idct code. Change-Id: I5c4e3223944c68e4ccf762f6cf07c990250e4290	2013-03-27 12:36:08 -07:00
Yunqing Wang	21a718d9a7	Optimize 32x32 idct function Wrote sse2 version of vp9_short_idct_32x32 function. Compared to c version, the sse2 version is 5X faster. Change-Id: I071ab7378358346ab4d9c6e2980f713c3c209864	2013-03-27 11:05:42 -07:00
Yunqing Wang	869d6c0534	Optimize 16x16 idct10 function Wrote sse2 version of vp9_short_idct10_16x16 function. Compared to c version, the sse2 version is 2.3X faster. Change-Id: I314c4f09369648721798321eeed6f58e38857f26	2013-03-21 16:36:01 -07:00
Yunqing Wang	8a3233b54d	Merge "Optimize 16x16 idct function" into experimental	2013-03-21 11:54:20 -07:00
Yunqing Wang	ec3100661c	Optimize 16x16 idct function Wrote sse2 version of vp9_short_idct16x16 function. Compared to c version, the sse2 version is over 2.5X faster. Change-Id: I38536e2b846427a2cc5c5423aaf305fd0e605d61	2013-03-21 11:44:05 -07:00
Dmitry Kovalev	56f3a2c663	Code cleanup: lower case variable names. Renaming Width to width, Height to height and Version to version in several structs and function signatures. Change-Id: I084c3f7e747cb2ce3345aff27a3dff9b13a87543	2013-03-20 16:41:30 -07:00
Yunqing Wang	6344c84c82	Optimize 8x8 idct function Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8. Compared to c version, the sse2 version is 2X faster. The decoder test didn't show noticeable gain since 8x8 idct doesn't take much of decoding time (less than 1% in my test). Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3	2013-03-18 15:34:14 -07:00
Yaowu Xu	005552639b	removed reference to "LLM" and "x8" The commit changed the name of files and function to remove obselete reference to LLM and x8. Change-Id: I973b20fc1a55149ed68b5408b3874768e6f88516	2013-03-13 08:35:46 -07:00
Yunqing Wang	11ca81f8b6	Add vp9_idct4_1d_sse2 Added SSE2 idct4_1d which is called by vp9_short_iht4x4. Also, modified the parameter type passed to vp9_short_iht functions to make it work with rtcd prototype. Change-Id: I81ba7cb4db6738f1923383b52a06deb760923ffe	2013-03-08 15:04:22 -08:00
Yunqing Wang	37932d9168	Merge "Optimize vp9_short_idct4x4llm function" into experimental	2013-03-04 14:13:31 -08:00
Yunqing Wang	e8bc9f4220	Optimize vp9_short_idct4x4llm function Wrote a SSE2 vp9_short_idct4x4llm to improve the decoder performance. Change-Id: I90b9d48c4bf37aaf47995bffe7e584e6d4a2c000	2013-03-04 12:01:27 -08:00
John Koleszar	69c67c9531	Merge master branch into experimental Picks up some build system changes, compiler warning fixes, etc. Change-Id: I2712f99e653502818a101a72696ad54018152d4e	2013-03-01 11:06:05 -08:00
Yunqing Wang	d6ff6fe2ed	Merge "Remove unused file" into experimental	2013-02-27 11:58:29 -08:00
Yunqing Wang	5ef694cfb8	Remove unused file Removed vp9_idctllm_mmx.asm Change-Id: I7152756f23a5a09ed69e8fb40edb2ab3237290fe	2013-02-27 11:00:58 -08:00
Jan Kratochvil	82ed3f9a41	Fix --as=nasm compatibility for new asm code. s/movd/movq/ Change-Id: Id1a56de91551f8dc796f14f1056c565dfc1ba626	2013-02-27 09:55:38 -08:00
Yunqing Wang	35bc02c6eb	Optimize vp9_dc_only_idct_add_c function Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to improve performance, clipped the absolute diff values to [0, 255]. This allowed us to keep the additions/subtractions in 8 bits. Test showed an over 2% decoder performance increase. Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d	2013-02-26 17:16:13 -08:00
Scott LaVarnway	30f866f44b	WIP: ssse3 version of convolve avg functions Initial ssse3 convolve avg functions and is one step closer to using x86inc.asm. The decoder performance improved by 8% for the test clip used. This should be revisited later to see if averaging outside the loop is better than having many similar filter functions. Change-Id: Ice3fafb423b02710b0448ffca18b296bcac649e9	2013-02-13 09:15:38 -08:00
Scott LaVarnway	eda30b410e	Bug fix: ssse3 version of subpixel did not match C code A 16 bit overflow condition occurs when using the EIGHTTAP_SMOOTH filters. (vp9_sub_pel_filters_8lp) Changed the order of the adds to fix this problem. Also added ssse3 support for 4x4 subpixel filtering. Change-Id: I475eaadae920794c2de5e01e9735c059a856518e	2013-02-09 15:15:14 -08:00
John Koleszar	3de8ee6ba1	Merge changes Ife0d8147,I7d469716,Ic9a5615f into experimental * changes: Restore SSSE3 subpixel filters in new convolve framework Convert subpixel filters to use convolve framework Add 8-tap generic convolver	2013-02-08 13:19:47 -08:00
John Koleszar	29d47ac80e	Restore SSSE3 subpixel filters in new convolve framework This commit adds the 8 tap SSSE3 subpixel filters back into the code underneath the convolve API. The C code is still called for 4x4 blocks, as well as compound prediction modes. This restores the encode performance to be within about 8% of the baseline. Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c	2013-02-08 12:18:14 -08:00
Ronald S. Bultje	aac73df1a7	Use configure checks for various inline keywords. Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24	2013-02-06 16:12:56 -08:00
John Koleszar	7a07eea13f	Convert subpixel filters to use convolve framework Update the code to call the new convolution functions to do subpixel prediction rather than the existing functions. Remove the old C and assembly code, since it is unused. This causes a 50% performance reduction on the decoder, but that will be resolved when the asm for the new functions is available. There is no consensus for whether 6-tap or 2-tap predictors will be supported in the final codec, so these filters are implemented in terms of the 8-tap code, so that quality testing of these modes can continue. Implementing the lower complexity algorithms is a simple exercise, should it be necessary. This code produces slightly better results in the EIGHTTAP_SMOOTH case, since the filter is now applied in only one direction when the subpel motion is only in one direction. Like the previous code, the filtering is skipped entirely on full-pel MVs. This combination seems to give the best quality gains, but this may be indicative of a bug in the encoder's filter selection, since the encoder could achieve the result of skipping the filtering on full-pel by selecting one of the other filters. This should be revisited. Quality gains on derf positive on almost all clips. The only clip that seemed to be hurt at all datarates was football (-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR, 0.347% SSIM. Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff	2013-02-05 14:23:17 -08:00
Scott LaVarnway	6a997400ff	Intrinsic version of loopfilter now matches C code Updated the instrinsic code to match Yaowu's latest loopfilter change. (I584393906c4f5f948a581d6590959522572743bb) The decoder performance improved by ~30% for the test clip used. Change-Id: I026cfc75d5bcb7d8d58be6f0440ac9e126ef39d2	2013-01-23 09:31:40 -08:00
Yaowu Xu	9bf73f46f9	fix a number issues that cause failures During master jenkins verification proces Change-Id: I3722b8753eaf39f99b45979ce407a8ea0bea0b89	2013-01-14 18:32:32 -08:00
Yaowu Xu	f7dab60096	Merge experiment "widerlpf" Change-Id: I0c94475075e66e13cfe4c20fab7db6474441ae86	2013-01-14 15:17:35 -08:00
Jim Bankoski	e42b280e11	Merge "WIP: Added sse2 version of vp9_mb_lpf_horizontal_edge_w" into experimental	2013-01-11 17:15:41 -08:00
Scott LaVarnway	b20ce07d76	WIP: Added sse2 version of vp9_mb_lpf_horizontal_edge_w and vp9_mb_lpf_vertical_edge_w_sse2. This was quickly done so we can run some tests over the weekend. Future commits will optimize/refactor these functions further. The decoder performance improved by ~17% for the clip used. Change-Id: I612687cd5a7670ee840a0cbc3c68dc2b84d4af76	2013-01-11 17:11:04 -08:00
Yaowu Xu	bbe1c9257f	Merge "Add loop filtering for UV plane" into experimental	2013-01-11 16:56:39 -08:00
Yaowu Xu	9a1d73d036	Add loop filtering for UV plane On block boundary within a MB when 8x8 block boundary only is filtered for Y. Change-Id: Ie1c804c877d199e78e2fecd8c2d3f1e114ce9ec1	2013-01-11 16:32:06 -08:00
Scott LaVarnway	4987c0f07e	Initial sse2 version of the wide loopfilters Updated the rtcd_defs and used the sse2 uv version of the loopfilter. The performance improved by ~8% for the test clip used. Change-Id: I5a0bca3b6674198d40ca4a77b8cc722ddde79c36	2013-01-11 14:54:14 -08:00
Yunqing Wang	f1c56a8c8c	Merge "vp9_sub_pixel_variance16x2 SSE2 optimization" into experimental	2013-01-08 12:59:08 -08:00
Yunqing Wang	8d568312a2	vp9_sub_pixel_variance16x2 SSE2 optimization About 5% decoder speedup. Change-Id: Ib6687d337af758a536a0e7e289f400990f1f9794	2013-01-08 12:01:55 -08:00
John Koleszar	879cb7d962	Merge vp9-preview changes into experimental branch Incorportate vp9-preview changes by merging master branch into experimental. Conflicts: test/test.mk vp9/common/vp9_filter.c vp9/common/vp9_idctllm.c vp9/common/vp9_invtrans.h vp9/common/vp9_mbpitch.c vp9/common/vp9_rtcd_defs.sh vp9/common/vp9_systemdependent.h vp9/common/vp9_type_aliases.h vp9/common/x86/vp9_asm_stubs.c vp9/common/x86/vp9_subpixel_mmx.asm vp9/decoder/vp9_decodframe.c vp9/decoder/vp9_dequantize.c vp9/decoder/vp9_dequantize.h vp9/decoder/vp9_onyxd_int.h vp9/encoder/vp9_bitstream.c vp9/encoder/vp9_encodeframe.c vp9/encoder/vp9_rdopt.c Change-Id: I17f51c3666d1b59cf1a699f87607cbc5d30a87c5	2013-01-08 10:19:59 -08:00
John Koleszar	5ebe94f9f1	Build fixes to merge vp9-preview into master Various fixups to resolve issues when building vp9-preview under the more stringent checks placed on the experimental branch. Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07	2012-12-26 11:21:09 -08:00
Scott LaVarnway	89ac94f8fb	Removed mmx versions of vp9_bilinear_predict filters These filters will not work with VP9. Change-Id: Ic26c77961084fcea6bfa97f4cd95afdea2282e85	2012-12-21 14:41:49 -08:00

1 2

71 Commits