generic-library/vpx

Author	SHA1	Message	Date
Dmitry Kovalev	9d771f948f	Merge "Motion vectors code cleanup." into experimental	2013-02-27 13:34:56 -08:00
Yunqing Wang	bbc7b6a86a	Merge "Remove unused file" into experimental	2013-02-27 13:00:10 -08:00
John Koleszar	5ac141187a	Merge "Remove unused vp9_copy32xn" into experimental	2013-02-27 12:23:45 -08:00
Yunqing Wang	d6ff6fe2ed	Merge "Remove unused file" into experimental	2013-02-27 11:58:29 -08:00
Dmitry Kovalev	0c0de00217	Motion vectors code cleanup. Fixing indentation, removing redundant parenthesis, deciphering single letter variable names, better spacing. Change-Id: I1d447a7d69eddbf1e94e0820423615f40ea2d591	2013-02-27 11:48:13 -08:00
Ronald S. Bultje	90932399b4	Merge "Move eob from BLOCKD to MACROBLOCKD." into experimental	2013-02-27 11:39:16 -08:00
Yunqing Wang	8092aaf9ec	Merge "Optimize vp9_dc_only_idct_add_c function" into experimental	2013-02-27 11:38:45 -08:00
John Koleszar	09be534f13	Merge "give vp9 variance struct a unique name"	2013-02-27 11:22:36 -08:00
Yunqing Wang	bf6cca44ad	Remove unused file Removed vp9/decoder/x86/vp9_idct_blk_mmx.c Change-Id: I07ab06382a394cf556fa5a8e3c98b91f6e4f9ce8	2013-02-27 11:13:19 -08:00
Yunqing Wang	5ef694cfb8	Remove unused file Removed vp9_idctllm_mmx.asm Change-Id: I7152756f23a5a09ed69e8fb40edb2ab3237290fe	2013-02-27 11:00:58 -08:00
Ronald S. Bultje	e8c74e2b70	Move eob from BLOCKD to MACROBLOCKD. Consistent with VP8. Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07	2013-02-27 11:00:55 -08:00
John Koleszar	0921bfb749	Merge "Use ref_frame_map vice active_ref_idx on the encoder" into experimental	2013-02-27 10:59:08 -08:00
John Koleszar	9615fd8f39	Merge "Test upscaling as well as downscaling" into experimental	2013-02-27 10:25:51 -08:00
John Koleszar	7ad8dbe417	Remove unused vp9_copy32xn This function was part of an optimization used in VP8 that required caching two macroblocks. This is unused in VP9, and might not survive refactoring to support superblocks, so removing it for now. Change-Id: I744e585206ccc1ef9a402665c33863fc9fb46f0d	2013-02-27 10:24:56 -08:00
John Koleszar	d8e68bd14b	Merge changes I922f8602,I0ac3343d into experimental * changes: Use 256-byte aligned filter tables Set scale factors consistently for SPLITMV	2013-02-27 10:08:53 -08:00
Jan Kratochvil	82ed3f9a41	Fix --as=nasm compatibility for new asm code. s/movd/movq/ Change-Id: Id1a56de91551f8dc796f14f1056c565dfc1ba626	2013-02-27 09:55:38 -08:00
John Koleszar	350ba5f30e	Merge "Combined motion compensation with scaled predictors" into experimental	2013-02-27 09:46:12 -08:00
John Koleszar	800ad0b886	Use ref_frame_map vice active_ref_idx on the encoder This patch makes the encoder's use of ref_frame_map and active_ref_idx consistent with the decoder. ref_frame_map[] maps a reference buffer index to its actual location in the yv12_fb array, since many references may share an underlying buffer. active_ref_idx[] mirrors cpi->{lst,gld,alt}_fb_idx, holding the active references in each slot. This also fixes a bug in setup_buffer_inter() where the incorrect reference was used to populate the scaling factors. Change-Id: Id3728f6d77cffcd27c248903bf51f9c3e594287e	2013-02-27 08:22:40 -08:00
John Koleszar	b683eecf6d	Test upscaling as well as downscaling Fixes a bug in vp9_set_internal_size() that prevented returning to the unscaled state. Updated the ResizeInternalTest to scale both down and up. Added a check that all frames are within 2.5% of the quality of the initial keyframe. Change-Id: I3b7ef17cdac144ed05b9148dce6badfa75cff5c8	2013-02-27 08:22:40 -08:00
John Koleszar	6fd7dd1a70	Use 256-byte aligned filter tables This avoids duplicating all the filters twice. Includes fixups to the convolve routines and associated tests to make this work. Change-Id: I922f86021594e55072ddb63b42b2313605db6e00	2013-02-27 08:22:39 -08:00
John Koleszar	77f88e97fa	Combined motion compensation with scaled predictors This patch extends the previous support for using references of a different resolution in ZEROMV mode to all inter prediction modes. Subpixel based best-mv scoring is disabled when the reference frame differs in resolution from the current frame. Change-Id: Id4dc3e5e6692de98d9857fd56bfad3ac57e944ac	2013-02-27 08:22:39 -08:00
John Koleszar	472eeaf082	Set scale factors consistently for SPLITMV This commit updates the 4x4 prediction to consistently use the build_2x1_inter_predictor() method. That function is updated to calculate the scale offset, rather than relying on the caller to calculate it. In the case that the 2x1 prediction can not be used, the scale offset is recalculated for each 1x1 block. The idea here is that the offsets are calculated before each call to vp9_build_scaled_inter_predictor(). Change-Id: I0ac3343dd54e2846efa3c4195fcd328b709ca04d	2013-02-27 08:22:39 -08:00
Yaowu Xu	858b60e8d0	Merge "Improve 32x32 forward dct" into experimental	2013-02-27 07:56:42 -08:00
John Koleszar	eb939f45b8	Spatial resamping of ZEROMV predictors This patch allows coding frames using references of different resolution, in ZEROMV mode. For compound prediction, either reference may be scaled. To test, I use the resize_test and enable WRITE_RECON_BUFFER in vp9_onyxd_if.c. It's also useful to apply this patch to test/i420_video_source.h: --- a/test/i420_video_source.h +++ b/test/i420_video_source.h @@ -93,6 +93,7 @@ class I420VideoSource : public VideoSource { virtual void FillFrame() { // Read a frame from input_file. + if (frame_ != 3) if (fread(img_->img_data, raw_sz_, 1, input_file_) == 0) { limit_ = frame_; } This forces the frame that the resolution changes on to be coded with no motion, only scaling, and improves the quality of the result. Change-Id: I1ee75d19a437ff801192f767fd02a36bcbd1d496	2013-02-26 23:54:23 -08:00
Dmitry Kovalev	c7805395fd	Merge "Removing redundant 'extern' keyword from function declarations." into experimental	2013-02-26 20:56:32 -08:00
Ronald S. Bultje	96d260515a	Merge "Merge cnvcontext experiment." into experimental	2013-02-26 19:39:39 -08:00
Ronald S. Bultje	1a0533958b	Merge "Fix modes.stt output printf format string." into experimental	2013-02-26 19:39:33 -08:00
Ronald S. Bultje	db54e6774f	Merge "Minor cosmetics in rdopt." into experimental	2013-02-26 19:39:28 -08:00
Yunqing Wang	35bc02c6eb	Optimize vp9_dc_only_idct_add_c function Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to improve performance, clipped the absolute diff values to [0, 255]. This allowed us to keep the additions/subtractions in 8 bits. Test showed an over 2% decoder performance increase. Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d	2013-02-26 17:16:13 -08:00
James Zern	4446af78f0	Merge "vp9: promote gf_group_bits calculation to 64-bit" into experimental	2013-02-26 16:27:45 -08:00
Dmitry Kovalev	971ff2679f	Removing redundant 'extern' keyword from function declarations. Change-Id: I893fa36297b9bd9cff93d082f1736f6860b15c0d	2013-02-26 15:52:05 -08:00
John Koleszar	25686fc22d	Merge "Refactor inter recon functions to support scaling" into experimental	2013-02-26 11:45:28 -08:00
Dmitry Kovalev	998bed1d2c	Merge "Changing pitch value meaning for fht and iht transforms." into experimental	2013-02-26 10:44:15 -08:00
Ronald S. Bultje	b1641150b1	Merge cnvcontext experiment. Change-Id: I35e64998b25694a3bb4a62164bba3c03c1db4bc7	2013-02-26 10:40:15 -08:00
Ronald S. Bultje	f3fdb4c37d	Fix modes.stt output printf format string. Change-Id: I17e2d2f6a4da86d9e4af7bebdea0bf5d154da084	2013-02-26 10:40:15 -08:00
Ronald S. Bultje	71539eae2a	Minor cosmetics in rdopt. Change-Id: I62497dcf2074b4bb4787bf660e727e5cf1bf3472	2013-02-26 10:40:11 -08:00
Ronald S. Bultje	c4ae97911a	Merge "make cost_coeffs to use combined context" into experimental	2013-02-26 10:32:01 -08:00
John Koleszar	6a4f708c25	Refactor inter recon functions to support scaling Ensure that all inter prediction goes through a common code path that takes scaling into account. Removes a bunch of duplicate 1st/2nd predictor code. Also introduces a 16x8 mode for 8x8 MVs, similar to the 8x4 trick we were doing before. This has an unexpected effect with EIGHTTAP_SMOOTH, so it's disabled in that case for now. Change-Id: Ia053e823a8bc616a988a0af30452e1e75a739cba	2013-02-26 10:03:29 -08:00
Yaowu Xu	66d94ac13c	Improve 32x32 forward dct The commit improves the 32x32 forward dct implementation: 1. change to use same constants and rounding as other forward dcts 2. select rounding to specifically minimize the roundtrip error, which improved average 19/block to .77/block using 100000 random input. Test showed a small but consistent gain on all test sets, about .15% Change-Id: If0afd6a71880a522f60c1c234be0462092c2eb53	2013-02-26 09:23:01 -08:00
Dmitry Kovalev	9bf3f75168	Changing pitch value meaning for fht and iht transforms. Pitch now means the number of elements, not the number of bytes. Change-Id: Idb9f2f012e39b09d596a3cc1802305a80b7c13af	2013-02-25 18:19:55 -08:00
Yaowu Xu	ecb03e9a3f	make cost_coeffs to use combined context Change-Id: Ia15f4244595fab49bffda0c651a750a8a9481d28	2013-02-25 17:01:33 -08:00
Dmitry Kovalev	9770d564f4	Code cleanup. Removing switch statements for inverse hybrid transforms. Making code style consistent for all similar transform implementations. Renaming shortpitch and short_pitch variables to half_pitch. Change-Id: I875f7a82aae4e8063a58777bf1cc3f1e67b48582	2013-02-25 15:14:01 -08:00
Dmitry Kovalev	3171b69dee	Merge "Code cleanup." into experimental	2013-02-25 14:14:22 -08:00
Dmitry Kovalev	0287d20a05	Merge "Code cleanup." into experimental	2013-02-25 13:58:06 -08:00
Jingning Han	e7b67d33a9	Merge "Improving the forward 16x16 ADST/DCT accuracy" into experimental	2013-02-25 13:38:33 -08:00
Dmitry Kovalev	20b0cb599b	Code cleanup. Removing redundant parentheses, better code formatting, introducing ROUND_POWER_OF_TWO macro to replace repeated expression. Change-Id: I91aad7a53ed03482428b2419de4bb99fd92c6771	2013-02-25 13:38:18 -08:00
Dmitry Kovalev	ab196b7e9b	Code cleanup. Lower case names of variables. Removing redundant spaces, parentheses, casts, and variables. Change-Id: I55b80c55b7d5adca44c1e8adb40a124c0680f229	2013-02-25 13:33:56 -08:00
James Zern	b2fc3ca066	vp9: promote gf_group_bits calculation to 64-bit avoids signed integer overflow Change-Id: I9ffcdba90b21edb324d1b173fd11d613e0592931	2013-02-25 13:00:18 -08:00
Paul Wilkins	0e36158c70	Merge "Minor rate control refactoring and experiments." into experimental	2013-02-25 12:49:54 -08:00
Jingning Han	65821d6680	Improving the forward 16x16 ADST/DCT accuracy Increase the first stage dynamic range by 4 times, and reduce it back with proper rounding before applying the second stage. Hence it still fits in the given dynamic range and slightly improves the key frame coding performance. Change-Id: Ia4c5907446f20a95dc3de079c314b3ad1221d8aa	2013-02-25 12:13:37 -08:00
Jingning Han	77a3becf92	clean up forward and inverse hybrid transform Rebased. Remove the old matrix multiplication transform computation. The 16x16 ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16 300/0 in vp9/common/vp9_blockd.h. Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f	2013-02-25 09:16:12 -08:00
Paul Wilkins	97da8b8c33	Minor rate control refactoring and experiments. Some minor refactoring code relating to estimates of bits per MB at a given Q and estimating the allowed Q range. Most of the changes here were included in a previous commit. This commit seeks to separate out the refactoring from more the material changes. Two #define control flags have been added for experimentation. ONE_SHOT_Q_ESTIMATE force the two pass encoder to use its initial Q range estimate for the whole clip even if this results in a miss on the target data rate. In effect this tightens the Q range seen at the expense of rate control accuracy. DISABLE_RC_LONG_TERM_MEM is a related flag that disables the long term memory in the rate control. Local adjustments are still made to try and better hit the rate target on a per frame basis but the impact of rate control misses is not propagated to the remainder of the clip. This means that for example an overshoot early on will not cause frames later in the clip to be starved of bits. Again the result of this relaxation amy be less rate control accuracy especially on short clips. The flags are disabled by default for now. Change-Id: I7482f980146d8ea033b5d50cc689f772e4bd119e	2013-02-25 17:07:45 +00:00
Yaowu Xu	499fe05dc0	optimize forward 16x16 DCT for accuracy This commit added pre/post scaling for first half of fDCT16x16 to reduce error, by simulation of 100,000 blocks for random inputs, the average sse reduced from 2.1/block to 0.0498/block. also enabled tests for 16x16 fDCT and iDCT Change-Id: Id2a95f0464c6dd4118797d456237ae90274c0f02	2013-02-25 07:47:27 -08:00
Ronald S. Bultje	0c9e2e9a1d	Split coefficient token tables intra vs. inter. Change-Id: I5416455f8f129ca0f450d00e48358d2012605072	2013-02-23 07:33:46 -08:00
Paul Wilkins	c17672a33d	Further changes to coefficient contexts. This patch alters the balance of context between the coefficient bands (reflecting the position of coefficients within a transform blocks) and the energy of the previous token (or tokens) within a block. In this case the number of coefficient bands is reduced but more previous token energy bands are supported. Some initial rebalancing of the default tables has been by running multiple derf clips at multiple data rates using the ENTOPY_STATS macro. Further balancing needs to be done using larger image formatsd especially in regard to the bigger transform sizes which are not as well represented in encodings of smaller image formats. Change-Id: If9736e95c391e711b04aef6393d26f60f36e1f8a	2013-02-23 07:29:09 -08:00
Yaowu Xu	bf0570a7e6	Merge "optimize 8x8 fdct rounding for accuracy" into experimental	2013-02-22 22:20:57 -08:00
Yaowu Xu	22012ee994	optimize 8x8 fdct rounding for accuracy The commit added a final rounding choice for 8x8 forward dct to get rid of a sign bias at DC position and improve the accuracry in term of round trip error for 8x8 fDCT/iDCT. This commit also enabled forward 8x8 dct test. Change-Id: Ib67f99b0a24d513e230c7812bc04569d472fdc50	2013-02-22 16:55:30 -08:00
James Zern	e5fb6321a1	give vp9 variance struct a unique name variance_vtable clashed with vp8/common/variance.h Change-Id: I09c1de44d5519f1bd13f58c01144c0de4706de6f	2013-02-22 16:25:13 -08:00
James Zern	c21226b638	Merge "vp8: make gf_group_bits 64-bit"	2013-02-22 15:31:28 -08:00
James Zern	5e0724abad	Merge "vp8_first_pass(): avoid floating point div by 0"	2013-02-22 15:30:14 -08:00
James Zern	4e00060d29	vp8: make gf_group_bits 64-bit avoids signed integer overflow; matches kf_group_bits Change-Id: I193145cdc4fa53e70fba0a1731a03eb1a574931d	2013-02-22 12:45:28 -08:00
James Zern	fba9772dd2	vp8_first_pass(): avoid floating point div by 0 Change-Id: Id1e6a12db6b0c1d3f64ead8fd8834aadc30fbed2	2013-02-22 12:41:59 -08:00
Jingning Han	936aa281b5	Fixed the buffer overflow issue The issue that potentially broke the encoding process was due to the fact that the length of token link is calculated from the total number of tokens coded, while it is possible, in high bit-rate setting, this length is greater than the buffer length initially assigned to the cpi->tok. This patch increases the initially allocated buffer length assigned to cpi->tok from (mb_rows * mb_cols * 24 * 16) to (mb_rows * mb_cols * (1 + 24 * 16)). It resolves the buffer overflow problem. Change-Id: I8661a8d39ea0a3c24303e3f71a170787a1d5b1df	2013-02-22 12:30:35 -08:00
John Koleszar	606a2561d6	Merge "Code cleanup." into experimental	2013-02-22 11:20:20 -08:00
Dmitry Kovalev	548b4dd5f2	Code cleanup. Removing redundant 'extern' keywords and parentheses, fixing indentation, making variable names lower case, using short expressions x = c instead of x = x c, minor code simplifications. Change-Id: If6a25fcf306d1db26e90d27e3c24a32735c607de	2013-02-22 11:03:14 -08:00
Jingning Han	c67a20994f	Merge "Forward butterfly hybrid transform" into experimental	2013-02-22 09:20:26 -08:00
Paul Wilkins	b5f3cb6e37	Merge "Experimental removal of over quant code" into experimental	2013-02-22 08:44:40 -08:00
Paul Wilkins	dbf4942046	Experimental removal of over quant code The over quant code was added in VP8 post bitstream freeze to allow compression to lower data rates In VP9 the real qualtizer range has been greatly extended anyway. Change-Id: I5d384fa5e9a83ef75a3df34ee30627bd21901526	2013-02-22 14:00:51 +00:00
Jingning Han	babbd5d170	Forward butterfly hybrid transform This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT hybrid transform. The kernel of 4x4 ADST is sin((2k+1)(n+1)/(2N+1)). The kernel of 8x8/16x16 ADST is of the form sin((2k+1)(2n+1)/4N). Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1	2013-02-21 18:24:28 -08:00
Dmitry Kovalev	5a18106fb7	Code cleanup. Removing redundant 'extern' keywords. Moving VP9DX_BOOL_DECODER from .h to .c file. Change-Id: I5a3056cb3d33db7ed3c3f4629675aa8e21014e66	2013-02-21 13:50:15 -08:00
Ronald S. Bultje	8c16dee4f2	Merge "Remove "eobs" array in MACROBLOCKD." into experimental	2013-02-21 11:30:29 -08:00
John Koleszar	4674312382	Merge "Code cleanup." into experimental	2013-02-21 10:56:17 -08:00
Dmitry Kovalev	5da8534963	Code cleanup. Removing redundant 'extern' keyword from function declarations and making function arguments lower case. Change-Id: Idae9a2183b067f2b6c85ad84738d275e8bbff9d9	2013-02-21 10:34:33 -08:00
Ronald S. Bultje	35524e2231	Remove "eobs" array in MACROBLOCKD. The information is a duplicate of "eob" in BLOCKD. Change-Id: Ia6416273bd004611da801e4bfa6e2d328d6f02a3	2013-02-21 10:07:36 -08:00
Deb Mukherjee	048f593703	Merge "Refactoring of switchable filter search for speed" into experimental	2013-02-21 09:23:50 -08:00
John Koleszar	138ffb6ea9	Merge "Avoid division in intra prediction" into experimental	2013-02-21 08:33:17 -08:00
Deb Mukherjee	28b1db9278	Refactoring of switchable filter search for speed Refactors the switchable filter search in the rd loop to improve encode speed. Uses a piecewise approximation to a closed form expression to estimate rd cost for a Laplacian source with a given variance and quantization step-size. About 40% encode time reduction is achieved. Results (on a feb 12 baseline) show a slight drop: derf: -0.019% yt: +0.010% std-hd: -0.162% hd: -0.050% Change-Id: Ie861badf5bba1e3b1052e29a0ef1b7e256edbcd0	2013-02-20 18:34:42 -08:00
Jingning Han	abfd2a4880	Merge "Fixed the buffer overflow issue" into experimental	2013-02-20 16:27:27 -08:00
Jingning Han	232ccc2fbe	Fixed the buffer overflow issue The issue that potentially broke the encoding process was due to the fact that the length of token link is calculated from the total number of tokens coded, while it is possible, in high bit-rate setting, this length is greater than the buffer length initially assigned to the cpi->tok. This patch increases the initially allocated buffer length assigned to cpi->tok from (mb_rows * mb_cols * 24 * 16) to (mb_rows * mb_cols * (1 + 24 * 16)). It resolves the buffer overflow problem. Change-Id: I8661a8d39ea0a3c24303e3f71a170787a1d5b1df	2013-02-20 15:41:48 -08:00
Dmitry Kovalev	e6c89a1f9b	Merge "Code cleanup." into experimental	2013-02-20 12:47:54 -08:00
Yaowu Xu	441f24de3d	Merge "Merge lossless experiment" into experimental	2013-02-20 12:27:26 -08:00
Dmitry Kovalev	eb6aee50a4	Code cleanup. Change-Id: I7c6e3bebd94856b24dbe2aded7f9e04ef8bb8c08	2013-02-20 11:36:31 -08:00
Yaowu Xu	d262e26cc7	Merge lossless experiment Change-Id: I7b7b8d4fda3a23699e0c920d727f8c15d37d43aa	2013-02-20 07:54:28 -08:00
Paul Wilkins	ef01b956d8	Entropy stats output code. Fixes to make Entropy stats code work again Change-Id: I62e380481a4eb4c170076ac6ab36f0c2b203e914	2013-02-20 14:33:19 +00:00
Tero Rintaluoma	56e6c66b49	Avoid division in intra prediction - Using multiplication and shifting instead of division in intra prediction. - Maximum absolute difference is 1 for division statements in d45, d27, d63 prediction modes. However, errors can cumulate for large block sizes when using already predicted values. - Maximum number of non-matching result values in loops using division are: 4x4 0/16 8x8 0/64 16x16 10/256 32x32 13/1024 64x64 122/4096 Overall PSNR derf: 0.005 yt: -0.022 std-hd: 0.021 hd: -0.006 Change-Id: I3979a02eb6351636442c1af1e23d6c4e6ec1d01d	2013-02-20 10:37:36 +02:00
Yaowu Xu	6b1b341774	Merge "fixed an enc/dec mis-match issue" into experimental	2013-02-19 16:53:30 -08:00
Yaowu Xu	b13f38d4b3	fixed an enc/dec mis-match issue The issue was caused by a out-of-order merge, which leads to wrong functions are called at lossless mode. Change-Id: If157729abab62954c729e0377e7f53edb7db22ca	2013-02-19 16:26:27 -08:00
Jingning Han	cd907b1601	16x16 butterfly inverse ADST/DCT hybrid transform rebased. This patch includes 16x16 butterfly inverse ADST/DCT hybrid transform. It uses the variant ADST of kernel sin((2k+1)*(2n+1)/4N), which allows a butterfly implementation. The coding gains as compared to DCT 16x16 are about 0.1% for both derf and std-hd. It is noteworthy that for std-hd sets many sequences gains about 0.5%, some 0.2%. There are also few points that provides -1% to -3% performance. Hence the average goes to about 0.1%. Change-Id: Ie80ac84cf403390f6e5d282caa58723739e5ec17	2013-02-19 09:07:00 -08:00
Ronald S. Bultje	ae81d3a03f	Merge "Minor cosmetic cleanups." into experimental	2013-02-19 08:54:44 -08:00
Ronald S. Bultje	0694ea0ed6	Merge "Prevent filling transform size cache with uninitialized values." into experimental	2013-02-19 08:54:35 -08:00
Yaowu Xu	93d6b86cfd	Use lossless for Q0 The commit changes the coding mode to lossless whenever the lowest quantizer is choosen. As expected, test results showed no difference for cif and std-hd set where Q0 is rarely used. For yt and yt-hd set, Q0 is used for a number of clips, where this commit helped a lot in the high end. Average over all clips in the sets: yt: 2.391% 1.017% 1.066% hd: 1.937% .764% .787% Change-Id: I9fa9df8646fd70cb09ffe9e4202b86b67da16765	2013-02-19 06:18:42 -08:00
Ronald S. Bultje	aa84c16da2	Minor cosmetic cleanups. Change-Id: I13d8ae754827368755575dd699a087b3b11f5b16	2013-02-15 17:21:16 -08:00
Ronald S. Bultje	ebfdaa0e0b	Prevent filling transform size cache with uninitialized values. The 32x32 value in case of splitmv was uninitialized. this leads to all kind of erratic behaviour down the line. Also fill in dummy values for superblocks in keyframes (the values are currently unused, but we run into integer overflows anyway, which makes detecting bad cases harder). Lastly, in case we did not find any RD value at all, don't set tx_diff to INT_MIN, but instead set it to zero (since if we couldn't find a mode, it's unlikely that any particular transform would have made that worse or better; rather, it's likely equally bad for all tx_sizes). Change-Id: If236fd3aa2037e5b398d03f3b1978fbbc5ce740e	2013-02-15 17:21:16 -08:00
Ronald S. Bultje	4dfcb129fd	Merge "Remove some unused structs and members from the decoder." into experimental	2013-02-15 17:11:38 -08:00
Ronald S. Bultje	5bb103c486	Merge "Remove Y2 and Y-no-DC token types from the bitstream." into experimental	2013-02-15 17:11:20 -08:00
Jingning Han	e343732a92	Fixed a subtle issue that breaks encoding process This issue breaks the encoding process of the codebase. The effect emerges only in particular test sequence at certain bit-rates and frame limits. Change-Id: I02e080f2a49624eef9a21c424053dc2a1d902452	2013-02-15 14:49:30 -08:00
Ronald S. Bultje	6cde1c58d7	Remove some unused structs and members from the decoder. Change-Id: Ie309cb1f683a51c5dfac405fb32e8e2d6ee143ed	2013-02-15 14:06:30 -08:00
Ronald S. Bultje	3af36ea8cc	Remove Y2 and Y-no-DC token types from the bitstream. Change-Id: I7a5314daca993d46b8666ba1ec2ff3766c1e5042	2013-02-15 14:06:30 -08:00
Ronald S. Bultje	48598e30b1	Remove y2dc/ac Q delta values from the bitstream. Since there is no Y2, these values are always zero. This changes the bitstream results slightly, hence a separate commit. Change-Id: I2f838f184341868f35113ec77ca89da53c4644e0	2013-02-15 14:06:30 -08:00
Ronald S. Bultje	46dff5d233	Remove some Y2-related code. Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78	2013-02-15 14:06:25 -08:00
Scott LaVarnway	7755657ea7	Merge "WIP: ssse3 version of convolve avg functions" into experimental	2013-02-15 07:54:21 -08:00
John Koleszar	716db10f0d	Merge "Moved vp9_get_coef_band to header file" into experimental	2013-02-14 18:02:55 -08:00
Scott LaVarnway	ae886d6bff	Moved vp9_get_coef_band to header file allowing the compiler to inline. Change-Id: I66e5caf5e7fefa68a223ff0603aa3f9e11e35dbb	2013-02-14 12:27:25 -08:00
Yaowu Xu	03f28c0a12	Merge "Rewrote fdct16x16" into experimental	2013-02-14 09:06:37 -08:00
Paul Wilkins	45712dc8c8	Merge "Abstract selection of coef band." into experimental	2013-02-14 03:23:31 -08:00
Yunqing Wang	048b9d41a6	Rewrote fdct16x16 Used same algorithm as others. Change-Id: Ifdac560762aec9735cb4bb6f1dbf549e415c38a0	2013-02-13 16:19:10 -08:00
Ronald S. Bultje	51afedbe28	Merge "Remove 2nd-order transform for first-order DC coefficients." into experimental	2013-02-13 13:58:02 -08:00
Ronald S. Bultje	89a206ef2f	Add support for tile rows. These allow sending partial bitstream packets over the network before encoding a complete frame is completed, thus lowering end-to-end latency. The tile-rows are not independent. Change-Id: I99986595cbcbff9153e2a14f49b4aa7dee4768e2	2013-02-13 12:31:00 -08:00
Ronald S. Bultje	42d6be8080	Remove 2nd-order transform for first-order DC coefficients. Since addition of the larger-scale transforms (16x16, 32x32), these don't give a benefit at macroblock-sizes anymore. At superblock-sizes, 2nd-order transform was never used over the larger transforms. Future work should test whether there is a benefit for that use case. Change-Id: I90cadfc42befaf201de3eb0c4f7330c56e33330a	2013-02-13 12:28:19 -08:00
Paul Wilkins	9255ad107f	Abstract selection of coef band. This patch abstracts the selection of the coefficient band context into a function as a precursor to further experiments with the coefficient context. It also removes the large per TX size coefficient band structures and uses a single matrix for all block sizes within the test function. This may have an impact on quality (results to follow) but is only an intermediate step in the process of redefining the context. Also the quality impact will be larger initially because the default tables will be out of step with the new banding. In particular the 4x4 will in this case only use 7 bands. If needed we can add back block size dependency localized within the function, but this can follow on after the other changes to the definition of the context. Change-Id: Id7009c2f4f9bb1d02b861af85fd8223d4285bde5	2013-02-13 19:01:25 +00:00
Paul Wilkins	56049d9488	Fixed encoder decoder mismatch. Reverted part of change I19981d1ef0b33e4e5732739574f367fe82771a84 That gives rise to an enc/dec mismatch. As things stand the memsets are still needed. Change-Id: I9fa076a703909aa0c4da0059ac6ae19aa530db30	2013-02-13 18:56:56 +00:00
Paul Wilkins	0d284ffed1	Abstract the selection of coefficient context. This is an initial step to facilitate experimentation with changes to the prior token context used to code coefficients to take better account of the energy of preceding tokens. This patch merely abstracts the selection of context into two functions and does not alter the output. Change-Id: I117fff0b49c61da83aed641e36620442f86def86	2013-02-13 18:56:30 +00:00
Paul Wilkins	afa57bfc97	Merge "Remove NEWCOEFCONTEXT experiment." into experimental	2013-02-13 10:41:13 -08:00
Yaowu Xu	f01b08c96c	Merge "enable bitstream lossless support" into experimental	2013-02-13 10:26:58 -08:00
Yaowu Xu	d3de97794f	Merge "fix the lossless experiment" into experimental	2013-02-13 09:54:35 -08:00
Yaowu Xu	17db5d00be	enable bitstream lossless support 1. Added a bit in frame header to to indicate if a frame is encoded in lossless mode, so decoder does not make the decision based on Q0 2. Minor changes to make sure that lossy coding works same as when the lossless experiment is not enabled. 3. Renamed function pointers for transforms to be consistent, using prefix fwd_txm and inv_txm for forward and inverse respectively To encode in lossless mode, using "--lossless=1 --min-q=0 --max-q=0" with vpxenc. Change-Id: Ifae53b26d2ffbe378d707e29d96817b8a5e6c068	2013-02-13 09:24:39 -08:00
Yaowu Xu	16f25f9dc8	fix the lossless experiment Change-Id: I95acfc1417634b52d344586ab97f0abaa9a4b256	2013-02-13 09:20:26 -08:00
Scott LaVarnway	30f866f44b	WIP: ssse3 version of convolve avg functions Initial ssse3 convolve avg functions and is one step closer to using x86inc.asm. The decoder performance improved by 8% for the test clip used. This should be revisited later to see if averaging outside the loop is better than having many similar filter functions. Change-Id: Ice3fafb423b02710b0448ffca18b296bcac649e9	2013-02-13 09:15:38 -08:00
Paul Wilkins	6a9f0c61a4	Remove NEWCOEFCONTEXT experiment. Removal of the NEWCOEFCONTEXT experiment to reduce code clutter and make it easier to experiment with some other changes to the coefficient coding context. Change-Id: Icd17b421384c354df6117cc714747647c5eb7e98	2013-02-13 15:12:17 +00:00
Paul Wilkins	649be94cf0	Removal of Hybrid DWT/DCT experiment. Removal of experiment to simplify code base for other changes. Change-Id: If0a33952504558511926ad212bc311fc2bffb19a	2013-02-13 15:08:48 +00:00
Christian Duvivier	097f205289	Merge "Faster vp9_regular_quantize_b_8x8." into experimental	2013-02-12 17:08:00 -08:00
Christian Duvivier	0e4397f0cd	Faster vp9_regular_quantize_b_8x8. A couple of scalar optimizations speeding up quantization by about 1.6x. Overall encoder speedup is around 3%. Change-Id: I19981d1ef0b33e4e5732739574f367fe82771a84	2013-02-12 15:55:58 -08:00
Yunqing Wang	7630cf0c3f	Merge "Rewrote fdct8x8" into experimental	2013-02-12 15:52:31 -08:00
John Koleszar	1d60b6bcb5	Merge "Replace as_mv struct with array" into experimental	2013-02-12 13:59:04 -08:00
Ronald S. Bultje	f496f601fb	Add tile column size limits (256 pixels min, 4096 pixels max). This is after discussion with the hardware team. Update the unit test to take these sizes into account. Split out some duplicate code into a separate file so it can be shared. Change-Id: I8311d11b0191d8bb37e8eb4ac962beb217e1bff5	2013-02-12 10:33:34 -08:00
Ronald S. Bultje	cb00be1fa2	Merge "Clean up detokenize contextualization to be like tokenizer." into experimental	2013-02-12 09:47:29 -08:00
Scott LaVarnway	ff024f812b	Merge "Bug fix: ssse3 version of subpixel did not match C code" into experimental	2013-02-12 08:45:24 -08:00
Yunqing Wang	aa295918ed	Rewrote fdct8x8 Use consistent algorithm. Change-Id: Ib8484821ebc454b9d3380a3d6571798decd037f3	2013-02-11 22:28:05 -08:00
Ronald S. Bultje	491d095214	Clean up detokenize contextualization to be like tokenizer. Change-Id: I47174f797df2103da8913c6fb4f4e741817bae82	2013-02-11 17:21:37 -08:00
Christian Duvivier	094e2572df	Faster convolve8_avg. Implement convolve8_avg using common functions which are already optimized instead of using more obscure ones which have only C versions. Encoder overall speed-up of about 12%. Change-Id: I8c57aa76936c8a48f22b115f19f61d9f2ae1e4b6	2013-02-11 16:53:11 -08:00
Jingning Han	f1060e4cd8	Merge "butterfly inverse 4x4 ADST" into experimental	2013-02-11 14:46:06 -08:00
Yunqing Wang	ab2dc6ae57	Merge "Integerization of dct32x32" into experimental	2013-02-11 12:15:26 -08:00
Jingning Han	57e995ff9c	butterfly inverse 4x4 ADST fixed format issues. Implement the inverse 4x4 ADST using 9 multiplications. For this particular dimension, the original ADST transform can be factorized into simpler operations, hence is retained. Change-Id: Ie5d9749942468df299ab74e90d92cd899569e960	2013-02-11 10:42:39 -08:00
Ronald S. Bultje	5f2e8449b7	Merge "Port sadNxNx4d functions to x86inc.asm." into experimental	2013-02-11 08:20:12 -08:00
Paul Wilkins	aec5bed3db	Change rd thresholds and add speed trade off flags. Experimental tweaks to various thresholds to measure quality / speed trade off. Add flag that allows static segmentation to be turned off and disables it unless in the second pass of a two pass encode. Change-Id: I219702ffe858412a83db801cbbbd869924b8c61b	2013-02-11 11:54:36 +00:00
Scott LaVarnway	eda30b410e	Bug fix: ssse3 version of subpixel did not match C code A 16 bit overflow condition occurs when using the EIGHTTAP_SMOOTH filters. (vp9_sub_pel_filters_8lp) Changed the order of the adds to fix this problem. Also added ssse3 support for 4x4 subpixel filtering. Change-Id: I475eaadae920794c2de5e01e9735c059a856518e	2013-02-09 15:15:14 -08:00
Paul Wilkins	e4f949b55a	Merge "Nearest / Zero Mv default entropy tweak." into experimental	2013-02-09 04:21:08 -08:00
John Koleszar	7ca517f755	Replace as_mv struct with array Replace as_mv.{first, second} with a two element array, so that they can easily be processed with an index variable. Change-Id: I1e429155544d2a94a5b72a5b467c53d8b8728190	2013-02-08 20:23:35 -08:00
John Koleszar	dc836109e4	Merge "Pass macroblock index to pick inter functions" into experimental	2013-02-08 20:20:37 -08:00
Ronald S. Bultje	c0ce2ab349	Port sadNxNx4d functions to x86inc.asm. Change-Id: Ic639f5742f7a007753d7a3fa5c66235172eb31d8	2013-02-08 17:59:32 -08:00
Ronald S. Bultje	02ff360b33	Add sad64x64 and sad32x32 SSE2 versions. Also port the 4x4, 16x16, 8x16 and 16x8 versions to x86inc.asm; this makes them all slightly faster, particularly on x86-64. Remove SSE3 sad16x16 version, since the SSE2 version is now faster. About 1.5% overall encoding speedup. Change-Id: Id4011a78cce7839f554b301d0800d5ca021af797	2013-02-08 16:32:25 -08:00
Ronald S. Bultje	639b863d22	Make cost_coeffs() more efficient. Cache the constant offset in one variable to prevent re-loading that in each loop iteration, and mark the function as inline so we can use the fact that the transform size is always known in the caller. Almost 1% faster encoding overall. Change-Id: Id78325a60b025057d8f4ecd9003a74086ccbf85a	2013-02-08 16:32:24 -08:00
John Koleszar	6125a1ed81	Pass macroblock index to pick inter functions Pass the current mb row and column around rather than the recon_yoffset and recon_uvoffset, since those offsets will change from predictor to predictor, based on the reference frame selection. Change-Id: If3f9df059e00f5048ca729d3d083ff428e1859c1	2013-02-08 14:25:40 -08:00
John Koleszar	6dfc95fe63	Merge changes Icd1a2a5a,I204d17a1,I3ed92117 into experimental * changes: Initial support for resolution changes on P-frames Avoid allocating memory when resizing frames Adds a test for the VP8E_SET_SCALEMODE control	2013-02-08 14:20:05 -08:00
John Koleszar	3de8ee6ba1	Merge changes Ife0d8147,I7d469716,Ic9a5615f into experimental * changes: Restore SSSE3 subpixel filters in new convolve framework Convert subpixel filters to use convolve framework Add 8-tap generic convolver	2013-02-08 13:19:47 -08:00
John Koleszar	393b485627	Initial support for resolution changes on P-frames Allows inter-frames to change resolution. Currently these are almost equivalent to keyframes, as only intra prediction modes are allowed, but without the other context resets that occur on keyframes. Change-Id: Icd1a2a5af0d9462cc792588427b0a1f5b12e40d3	2013-02-08 12:20:30 -08:00
John Koleszar	c03d45def9	Avoid allocating memory when resizing frames As long as the new frame is smaller than the size that was originally allocated, we don't need to free and reallocate the memory allocated. Instead, do the allocation on the size of the first frame. We could make this passed in from the application instead, if we wanted to support external upscaling. Change-Id: I204d17a130728bbd91155bb4bd863a99bb99b038	2013-02-08 12:20:30 -08:00
John Koleszar	88f99f4ec2	Adds a test for the VP8E_SET_SCALEMODE control Tests that the external interface to set the internal codec scaling works as expected. Also updates the test to pull the height from the decoded frame size rather than parsing the keyframe header, in anticipation of allowing resolution changes on non-keyframes. Change-Id: I3ed92117d8e5288fbbd1e7b618f2f233d0fe2c17	2013-02-08 12:20:30 -08:00
John Koleszar	29d47ac80e	Restore SSSE3 subpixel filters in new convolve framework This commit adds the 8 tap SSSE3 subpixel filters back into the code underneath the convolve API. The C code is still called for 4x4 blocks, as well as compound prediction modes. This restores the encode performance to be within about 8% of the baseline. Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c	2013-02-08 12:18:14 -08:00
Yunqing Wang	dbccffe299	Integerization of dct32x32 Test on derf set showed 0.047% overall psnr change. Change-Id: Id16c276c251a3943850ac9b95e9b09a56cf42b19	2013-02-08 08:50:47 -08:00
Paul Wilkins	bbede82f24	Nearest / Zero Mv default entropy tweak. Tweak to default mode context to account for the fact that when there are no non zero motion candidates Nearest is now the preferred mode for coding a 0,0 vector. Also resolve duplicate function name and typos. Change-Id: I76802788d46c84e3d1c771be216a537ab7b12817	2013-02-08 10:16:13 +00:00
Yaowu Xu	e6ad9ab02c	move dct/idct constants to a header file also removed some un-unsed functions. Change-Id: Ie363bcc8d94441d054137d2ef7c4fe59f56027e5	2013-02-07 13:51:45 -08:00
Jingning Han	d15e1da494	Butterfly ADST based hybrid transform Refactor the 8x8 inverse hybrid transform. It is now consistent with the new inverse DCT. Overall performance loss (due to the use of this variant ADST, and the rounding errors in the butterfly implementation) for std-hd is -0.02. Fixed BUILD warning. Devise a variant of the original ADST, which allows butterfly computation structure. This new transform has kernel of the form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures using floating-point multiplications was reported in Z. Wang, "Fast algorithms for the discrete W transform and for the discrete Fourier transform", IEEE Trans. on ASSP, 1984. This patch includes the butterfly implementation of the inverse ADST/DCT hybrid transform of dimension 8x8. Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d	2013-02-07 10:07:46 -08:00
Paul Wilkins	29731308c4	Added skip switches for SB32 and SB64 Added switches and code to skip/breakout from doing SB32 and SB64 tests based on whether the 16x16 MB tests used split modes. Also to optionally skip 64x64 if 16x16 was chosen over 32x32. Impact varies depending on clip from a few % up to almost 50% on encode speed. Only the split mode breakout is currently enabled. Change-Id: Ib5836140b064b350ffa3057778ed2cadcc495cf8	2013-02-07 10:45:41 +00:00
Ronald S. Bultje	5cfd82bcaf	Use fdct8x4 instead of fdct4x4 where the block size allows it. This allows for faster SIMD implementations in the future (currently there is no speed impact). Change-Id: I732647e9148b5dcb44e6bc8728138f0141218329	2013-02-06 16:13:02 -08:00
Ronald S. Bultje	aac73df1a7	Use configure checks for various inline keywords. Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24	2013-02-06 16:12:56 -08:00
Ronald S. Bultje	a788e0fe63	Add sse2 versions of sub_pixel_variance{32x32,64x64}. 7.5% faster overall encoding. Change-Id: Ie9bb7f9fdf93659eda106404cb342525df1ba02f	2013-02-06 11:20:59 -08:00
Ronald S. Bultje	a001fe9708	Merge "Reindent segmentation code." into experimental	2013-02-06 10:07:30 -08:00
Ronald S. Bultje	55cafb6156	Reindent segmentation code. Indentation was off by 2 spaces for this particular block. Change-Id: I1e587b7ad3eff77ade5521252d20c7bb2daa0f6d	2013-02-06 09:18:25 -08:00
John Koleszar	31cbe2ed9a	Eliminate tautology Unreachable code that does nothing anyway removed forever. Change-Id: I14105d2dd9dbc9d558f36464055e350dbeb45488	2013-02-06 08:22:59 -08:00
Paul Wilkins	8b4e9c5925	Merge "Change definition of NearestMV." into experimental	2013-02-06 04:06:31 -08:00
Ronald S. Bultje	278df745d2	Fix mismatch after merge of the tiling patch. Change-Id: I8ecc178b4d4069e721c7fec6d7631c00e4a3e5d5	2013-02-05 17:15:04 -08:00
Ronald S. Bultje	1407bdc243	[WIP] Add column-based tiling. This patch adds column-based tiling. The idea is to make each tile independently decodable (after reading the common frame header) and also independendly encodable (minus within-frame cost adjustments in the RD loop) to speed-up hardware & software en/decoders if they used multi-threading. Column-based tiling has the added advantage (over other tiling methods) that it minimizes realtime use-case latency, since all threads can start encoding data as soon as the first SB-row worth of data is available to the encoder. There is some test code that does random tile ordering in the decoder, to confirm that each tile is indeed independently decodable from other tiles in the same frame. At tile edges, all contexts assume default values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode), and motion vector search and ordering do not cross tiles in the same frame. t log Tile independence is not maintained between frames ATM, i.e. tile 0 of frame 1 is free to use motion vectors that point into any tile of frame 0. We support 1 (i.e. no tiling), 2 or 4 column-tiles. The loopfilter crosses tile boundaries. I discussed this briefly with Aki and he says that's OK. An in-loop loopfilter would need to do some sync between tile threads, but that shouldn't be a big issue. Resuls: with tiling disabled, we go up slightly because of improved edge use in the intra4x4 prediction. With 2 tiles, we lose about ~1% on derf, ~0.35% on HD and ~0.55% on STD/HD. With 4 tiles, we lose another ~1.5% on derf ~0.77% on HD and ~0.85% on STD/HD. Most of this loss is concentrated in the low-bitrate end of clips, and most of it is because of the loss of edges at tile boundaries and the resulting loss of intra predictors. TODO: - more tiles (perhaps allow row-based tiling also, and max. 8 tiles)? - maybe optionally (for EC purposes), motion vectors themselves should not cross tile edges, or we should emulate such borders as if they were off-frame, to limit error propagation to within one tile only. This doesn't have to be the default behaviour but could be an optional bitstream flag. Change-Id: I5951c3a0742a767b20bc9fb5af685d9892c2c96f	2013-02-05 15:43:03 -08:00
Ronald S. Bultje	822864131b	Merge "Add SSE3 versions for sad{32x32,64x64}x4d functions." into experimental	2013-02-05 15:40:46 -08:00
Yaowu Xu	c9ae73b251	Merge "rewrite 4x4 idct and fdct" into experimental	2013-02-05 15:26:36 -08:00
Ronald S. Bultje	58c983d109	Add SSE3 versions for sad{32x32,64x64}x4d functions. Overall encoding about 15% faster. Change-Id: I176a775c704317509e32eee83739721804120ff2	2013-02-05 15:21:47 -08:00
John Koleszar	7a07eea13f	Convert subpixel filters to use convolve framework Update the code to call the new convolution functions to do subpixel prediction rather than the existing functions. Remove the old C and assembly code, since it is unused. This causes a 50% performance reduction on the decoder, but that will be resolved when the asm for the new functions is available. There is no consensus for whether 6-tap or 2-tap predictors will be supported in the final codec, so these filters are implemented in terms of the 8-tap code, so that quality testing of these modes can continue. Implementing the lower complexity algorithms is a simple exercise, should it be necessary. This code produces slightly better results in the EIGHTTAP_SMOOTH case, since the filter is now applied in only one direction when the subpel motion is only in one direction. Like the previous code, the filtering is skipped entirely on full-pel MVs. This combination seems to give the best quality gains, but this may be indicative of a bug in the encoder's filter selection, since the encoder could achieve the result of skipping the filtering on full-pel by selecting one of the other filters. This should be revisited. Quality gains on derf positive on almost all clips. The only clip that seemed to be hurt at all datarates was football (-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR, 0.347% SSIM. Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff	2013-02-05 14:23:17 -08:00
John Koleszar	5ca6a3667f	Add 8-tap generic convolver This commit introduces a new convolution function which will be used to replace the existing subpixel interpolation functions. It is much the same as the existing functions, but allows for changing the filter kernel on a per-pixel basis, and doesn't bake in knowledge of the filter to be applied or the size of the resulting block into the function name. Replacing the existing subpel filters will come in a later commit. Change-Id: Ic9a5615f2f456cb77f96741856fc650d6d78bb91	2013-02-05 14:19:28 -08:00
Yaowu Xu	fa36981ec8	rewrite 4x4 idct and fdct This commit changes the 4x4 iDCT to use same algorithm & constants as other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT. Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0	2013-02-05 11:42:49 -08:00
Paul Wilkins	81043e8d62	Change definition of NearestMV. This commit makes the NearestMV match the chosen best reference MV. It can be a 0,0 or non zero vector which means the the compound nearest mv mode can combine a 0,0 and a non zero vector. Change-Id: I2213d09996ae2916e53e6458d7d110350dcffd7a	2013-02-05 17:03:25 +00:00
Scott LaVarnway	77440d508b	Merge "Added vp9_short_idct1_32x32_c" into experimental	2013-02-05 08:56:05 -08:00
Scott LaVarnway	5780c4cbd5	Added vp9_short_idct1_32x32_c and called this function in vp9_dequant_idct_add_32x32_c when eob == 1. For the test clip used, the decoder performance improved by 21+%. Based on Yaowu's 16 point idct work. Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43	2013-02-04 16:49:17 -08:00
Paul Wilkins	3ab538767c	Re-factor code for rd thresholds. Separate out code to set the main encode speed related rd thresholds. Some values changed from the initial defaults for various new modes. Quality test results pending but even the addition of some further non-zero defaults helps encode speed somewhat in limited testing on derf clips. Adjustment of thresholds for quality / speed tradeoff to follow. Change-Id: I117ee473157e151a1b93193d5f393449328de20d	2013-02-04 18:48:41 +00:00
Yaowu Xu	1eb79dc1dc	re-write 8 point idct to be consistent with idct16 and idct32. Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204	2013-02-04 07:31:25 -08:00
Yaowu Xu	ccaaeb4b5a	a couple of minor fixes fixed a function prototypes to prevent compiler warnings; removed a function not in use; un-capitialize "Refstride" to ref_stride Change-Id: Ib4472b6084f357d96328c6a06e795b6813a9edba	2013-02-04 07:19:32 -08:00
Yaowu Xu	af4c9d2f88	Merge "Changes 16 point idct" into experimental	2013-02-01 08:22:20 -08:00
Yaowu Xu	c1f611be74	Merge "fix a small bug in 16 point forward dct" into experimental	2013-02-01 05:57:41 -08:00
Yaowu Xu	91e0e80142	Changes 16 point idct This commit changes the inverse 16 point dct to use the same algorithm as the one for 32 point idct. In fact, now 16 point dct uses the exact version of the souce code for even portion of the 32 point idct. Tests showed current implementation has significant better accuracy than the previous version. With this implementation and the minor bug fix on forward 16 point dct, encoding tests showed about 0.2% better compression of CIF set, test results on std-hd setting pending. Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63	2013-01-31 19:52:18 -08:00
Frank Galligan	f67d740b34	Add support for x64 and win64 yasm flags. Some projects must define only win64 for Windows 64bit builds using yasm. Change-Id: I1d09590d66a7bfc8b4412e1cc8685978ac60b748	2013-01-31 16:25:37 -08:00
Yaowu Xu	ab1cad9bdd	fix a small bug in 16 point forward dct The commit fixes a minor error in 16 point fdct where in a rotation can produce result of -1 instead of 0. Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade	2013-01-31 15:39:41 -08:00
Yaowu Xu	c94e55add0	Merge "A fix point implementation of 32x32 idct" into experimental	2013-01-31 10:48:01 -08:00
Yaowu Xu	5149d7f7bd	A fix point implementation of 32x32 idct This commit changes the 32x32 idct to use integer only. The algorithm was taken directly from "A Fast Computational Algorithm for the Discrete Cosine Tranform" by W. Chen, et al., which was published in IEEE Transaction on Communication Vol. Com.-25 No. 9, 1977. The signal flow graph in the original paper is for a 32 point forward dct, the current implementation of inverse DCT was done by follow the graph in reversed direction. With this implementation, the 32 point inverse dct contains a 16 point inverse dct in its even portion, similarly the 16 point idct further contains 8 point and 4 point inverse dcts. As of patch 4, encoding tests showed there is no compression loss when compared against the floating point baseline. Numbers even showed very small postives. (cif: .01%, std-hd: .05%). Change-Id: I2d2d17a424b0b04b42422ef33ec53f5802b0f378	2013-01-31 09:45:49 -08:00
Deb Mukherjee	a53be60904	Merge "Adding a frame parallel decoding mode" into experimental	2013-01-30 12:03:45 -08:00
Ronald S. Bultje	b499c24c2f	Merge "don't code the branch for the predicted seg_id if that flag is false." into experimental	2013-01-30 10:02:51 -08:00
Ronald S. Bultje	3a4b18bc67	don't code the branch for the predicted seg_id if that flag is false. Change-Id: Icb6e21dc0c2d9918faa33c8bf70943660df7ad88	2013-01-30 09:30:46 -08:00
Ronald S. Bultje	4d53a95a34	Merge "Default superblock skip flag to 32x32 for skip-blocks." into experimental	2013-01-30 09:12:17 -08:00
Ronald S. Bultje	de6718a3b9	Merge "Reset skip flag in superblock RD loop." into experimental	2013-01-30 09:12:02 -08:00
Deb Mukherjee	d28750537e	Merge "Further improvement on compound inter-intra expt" into experimental	2013-01-30 08:38:17 -08:00
Ronald S. Bultje	3febf9707d	Default superblock skip flag to 32x32 for skip-blocks. This is identical to the later decisions made in encode_superblock(). This commit doesn't actually change anything, but makes the mbmi state more consistent between the RD loop and the final encode result. Change-Id: I9e735afb7c5a52e5b61728cb88c67ef9b9bf59be	2013-01-29 21:46:31 -08:00
Ronald S. Bultje	b90996c51b	Reset skip flag in superblock RD loop. This is the superblock equivalent of commit `290b83a`. Change-Id: Ib3945dd9e992fa9ec1fdea5a11e17a3cc0e37637	2013-01-29 21:42:56 -08:00
Ronald S. Bultje	2f6fce3e5a	Write only visible area (for better comparison with rec.yuv). Change-Id: I32bf4ee532a15af78619cbcd8a193224029fab50	2013-01-29 16:58:52 -08:00
Frank Galligan	0524f33108	libvpx: Fix warnings on windows. Warnings found when tyring to build libvpx in Chromium. Change-Id: I5824d9e2c06351e0cf46e9f5fa102cc8b04cf963	2013-01-29 13:57:09 -08:00
Ronald S. Bultje	5a9da2d906	Merge "Fix block pointer corruption in intra8x8 prediction with 4x4 transform." into experimental	2013-01-29 12:49:42 -08:00
Ronald S. Bultje	64401f838f	Merge "Fix overread/write reported by valgrind if (mb_cols) & 3 != 0." into experimental	2013-01-29 12:49:22 -08:00
Paul Wilkins	d8e86af263	Merge "Remove eob_max_offset markers." into experimental	2013-01-29 09:29:45 -08:00
Paul Wilkins	5d1c62c639	Merge "Segment Skip Flag" into experimental	2013-01-29 09:29:26 -08:00
Scott LaVarnway	8b7eced6fe	Merge "Added eob == 0 check to vp9_dequant_idct_add_32x32_c" into experimental	2013-01-29 09:19:58 -08:00
Ronald S. Bultje	ffc2e4f4af	Fix block pointer corruption in intra8x8 prediction with 4x4 transform. The RD loop would change the pointer after the first mode (DC) was tested, leading to corrupt block objects being provided for the others. This would essentially render the i8x8 predictor useless. Change-Id: I16c5906ca64fb34878ac32ce59af8974e4582bb8	2013-01-29 09:18:47 -08:00
Paul Wilkins	93762ca9b2	Remove eob_max_offset markers. Remove eob_max_offset markers and replace with the generic skip_block flag to indicate to the quantizer that all coeffs to be set to 0 and eob position set to 0; Change-Id: Id477e8f8d4ec1a5562758904071013c24b76bfd7	2013-01-29 13:39:34 +00:00
Deb Mukherjee	3b04d467ac	Further improvement on compound inter-intra expt Adds a special combination mode specific to intra prediciton mode D45. Current results with the compound inter/intra experiment: derf: 0.2% yt: 0.55% std-hd: 0.75% hd: 0.74% Change-Id: I8976bdf3b9b0b66ab8c5c628bbc62c14fc72ca86	2013-01-29 00:21:29 -08:00

... 2 3 4 5 6 ...

661 Commits