generic-library/vpx

Author	SHA1	Message	Date
Adrian Grange	cf54b88043	Add VP9 decoder control to get frame size Adds a control function that allows the caller to get the size of the current frame. Change-Id: Iddfdedc0f3aa9aa46119f87d294681c82f275c9f	2015-02-13 09:09:49 -08:00
Jingning Han	e69c79e19a	Merge "Fix ioc issue in block_rd_txfm"	2015-02-12 15:07:41 -08:00
Jingning Han	5041aa0fbe	Fix ioc issue in block_rd_txfm Force 64-bit precision in the intermediate steps. Change-Id: I666113d9adcef8975da201d5aa1a13b783d09594	2015-02-12 12:51:39 -08:00
hkuang	aa5c8b757d	Merge "Remove unnecessary border extension when frame size change."	2015-02-12 12:00:32 -08:00
Marco	cc7d981de1	Merge "Add skin detection."	2015-02-12 11:12:27 -08:00
Jingning Han	f4c29ae9ea	Merge "Update partition rate cost in rtc speed 5"	2015-02-12 09:14:49 -08:00
Jingning Han	ee83243daa	Merge "Add mode cost to sub8x8 block mode decision in rtc coding"	2015-02-12 09:14:29 -08:00
Adrian Grange	24a6dbd853	Merge "Add cast to convert double to int"	2015-02-12 07:02:54 -08:00
Marco	56435bb7b6	Add skin detection. Simple skin detection, from vp8; works reasonable on most of the RTC clips, but could miss sometimes. Added debug flag to write out skin map over source input. Change-Id: I2caea7592f1c459047aac46627eeb24a94946464	2015-02-11 17:47:17 -08:00
Yunqing Wang	238707ab4c	Merge "Make vp9_print_modes_and_motion_vectors() work"	2015-02-11 16:58:52 -08:00
James Zern	d8ed558c99	Merge "vp9_thread: prefer pthread.h if available"	2015-02-11 16:50:07 -08:00
James Zern	a139f3f6fc	Merge "vp9_highbd_tm_predictor_16x16: fix win64"	2015-02-11 16:49:53 -08:00
Adrian Grange	053625e4cd	Add cast to convert double to int Change-Id: I7f63c2940256a5dadf9a29a853809290dd9e98ed	2015-02-11 15:59:48 -08:00
Jingning Han	e665c8f2c9	Add mode cost to sub8x8 block mode decision in rtc coding This commit allows the encoder to properly account for the mode cost in sub8x8 non-RD mode decision. Change-Id: I2951960d20e37ed08e372ee0c7044935b2b9b899	2015-02-11 14:43:02 -08:00
Jingning Han	c9725813db	Merge "Account for inter prediction filter rate cost in rtc mode selection"	2015-02-11 14:42:44 -08:00
Jingning Han	532cb435f8	Merge "Add ref frame rate cost to non-RD mode decision"	2015-02-11 14:36:48 -08:00
Jingning Han	7a4e0b2265	Update partition rate cost in rtc speed 5 The block partition rate cost should be updated when recursive partition search is needed. Change-Id: I7bc5ad1fc2cbd3577dee7f7e8da111a2742bdeb9	2015-02-11 12:48:29 -08:00
Jingning Han	41b7f76db1	Account for inter prediction filter rate cost in rtc mode selection Add the rate cost on inter prediction filter type to the overall rate-distortion cost in vp9_pick_mode_inter. Change-Id: I72c34017adf5220cadb3962694ee5404469fc673	2015-02-11 12:17:29 -08:00
Jingning Han	4ce70e8847	Add ref frame rate cost to non-RD mode decision This commit adds a heuristic rate cost of reference frame to the non-RD mode decision. It improves the compression performance of speed -6 by 0.31% and speed -5 by 0.69%. Change-Id: If7f3b45519d49b2cb640bcb7316a254efc8be446	2015-02-11 11:08:10 -08:00
James Zern	923cc0bf51	vp9_highbd_tm_predictor_16x16: fix win64 by saving xmm8; cglobal's xmm reg arg is 0-based Change-Id: Ic8426ec9ac59ab4478716aa812452a6406794dcb	2015-02-10 19:34:12 -08:00
Yunqing Wang	f37788eaf6	Make vp9_print_modes_and_motion_vectors() work MODE_INFO struct was modified, and vp9_print_modes_and_motion_vectors() didn't work anymore. This patch modified vp9_debugmodes.c so that this function works again for debug usage. Change-Id: I293fae0295235deb2529a460a274caf7c045ac1a	2015-02-10 16:37:02 -08:00
Yaowu Xu	ee5d79995e	Move computation up to frame level This is to avoid redo the same calculation repeatly, and also allow easier adjustments for further experiments. This commit shall have no effect on quality/compression. Change-Id: I4460acf5c808ff5518da18d21e002c5da58af857	2015-02-10 15:41:52 -08:00
hkuang	bf3cb25019	Remove unnecessary border extension when frame size change. This border extension is not needed with on-demond border extension. Change-Id: I8501b37f5f756dc7e874cef4c1cfdbfa9a16112a	2015-02-10 14:55:27 -08:00
James Zern	d167a1aeee	vp9_thread: prefer pthread.h if available this avoids conflicts with recent versions of mingw-w64 (tested g++ 4.8.2) and the unit tests Change-Id: Ic41ea31eebe0e3e712ed5e657f37d8cad6712088	2015-02-10 12:47:14 -08:00
Adrian Grange	2d924161c7	Merge "Auto-adaptive encoder frame resizing logic"	2015-02-10 12:16:55 -08:00
Jingning Han	f0eea5be2a	Merge "Fix block partition size in fill_mode_info_sb"	2015-02-10 10:49:03 -08:00
Adrian Grange	23ebacdb81	Auto-adaptive encoder frame resizing logic Note: This feature is still in development. Add an option for the encoder to decide the resolution at which to encode each frame. Each KF/GF/ARF goup is tested to see if it would be better encoded at a lower resolution. At present, each KF/GF/ARF is coded first at full-size and if the coded size exceeds a threshold (twice target data rate) at the maximum active Q then the entire group is encoded at lower resolution. This feature is enabled in vpxenc by setting: --resize-allowed=1 In addition, if the vpxenc command line also specifies valid frame dimensions using: --resize-width=XXXX & --resize_height=YYYY then all frames will be encoded at this resolution. Change-Id: I13f341e0a82512f9e84e144e0f3b5aed8a65402b	2015-02-10 09:59:32 -08:00
Yunqing Wang	84b813aa42	Merge "Make encoder and decoder share common thread function"	2015-02-10 09:06:41 -08:00
Yunqing Wang	d3a37731c2	Merge "Rename loopfilter_thread files to thread_common files"	2015-02-10 09:06:23 -08:00
Jingning Han	ebb4c9e8e7	Fix block partition size in fill_mode_info_sb This commit fixes the sub block partition size used in fill_mode_info_sb. Previous implementation effectively disabled the rectangular block sizes. This commit resolved this issue. Change-Id: Ic1c383ab0a9a2e7d59e85b388093f1f1f94d1e7f	2015-02-10 08:39:32 -08:00
hkuang	1cdb5439e7	Merge "Set the maximum decode threads to be 8."	2015-02-09 16:25:20 -08:00
Yunqing Wang	07eb8c8da3	Merge "Fix high bit depth assembly function bugs"	2015-02-09 15:30:36 -08:00
hkuang	dd88f48296	Set the maximum decode threads to be 8. This will fix the frame parallel decode hang on windows due to not enough semaphores. This will also make the frame parallel decode safer as the number of frame buffers could only support maximum 8 threads. Change-Id: Id9ef50692819dcbebbd74a0aabffbfb3f39a4309	2015-02-09 10:38:41 -08:00
hkuang	67b61c7ace	Fix jenkins unit test failure due to "uninitialised value". Change-Id: Ief6b526486bc729dcb787358bc0b781f278bdc66	2015-02-07 15:13:45 -08:00
Yunqing Wang	4ae092c660	Make encoder and decoder share common thread function Moved vp9_accumulate_frame_counts to vp9_thread_common.c to eliminate the duplicate code. Change-Id: I9cf506d729603c8bf1494b4c86a3b7d47af1917a	2015-02-06 11:45:51 -08:00
Jingning Han	ba933b90c6	Merge "Re-arrange inter mode search order in RTC coding flow"	2015-02-06 10:11:33 -08:00
Yunqing Wang	41063137c3	Rename loopfilter_thread files to thread_common files Renames the files to allow more common thread code to be moved to vp9/common. Change-Id: I7386e64e221086e3cdc087e79812f993c423413b	2015-02-06 10:03:31 -08:00
Yaowu Xu	8b5e665098	Merge "Replace repeated check with single variable"	2015-02-06 09:17:59 -08:00
Jingning Han	b2762a8853	Re-arrange inter mode search order in RTC coding flow This commit makes the ZEROMV mode first in the search order to ensure that the zero mv is always checked in the RTC coding mode. It improves the average speed -6 compression performance by 0.3% in both PSNR and SSIM at no visible speed change. Change-Id: I465a7e59f4e20cd84fee3f02ced6f98036945949	2015-02-06 08:52:52 -08:00
James Zern	519b9141ad	Merge "vp9: fix segfault w/corrupt data post frame-parallel merge"	2015-02-06 00:28:10 -08:00
hkuang	1c396f3f8e	Merge "Fix a thread lost bug in frame parallel decode."	2015-02-05 14:07:35 -08:00
hkuang	65f046e29f	Merge "Mute the harmless tsan error in frame parallel decode."	2015-02-05 12:44:04 -08:00
James Zern	0261fb4c4f	vp9: fix segfault w/corrupt data post frame-parallel merge cm->frame_bufs[].idx values were made consistent in: `61c5e94` Use -1 consistently as invalid buffer idx update the initialization in swap_frame_buffers() to match. additionally: - remove some shadowed variables in the former and marked them volatile Change-Id: Ie3f9636c405bd822112bb56bd22d28024ae98909	2015-02-05 12:11:40 -08:00
Yunqing Wang	789ae447f8	Fix high bit depth assembly function bugs The high bit depth build failed while building for 32bit target. The bugs were in vp9_highbd_subpel_variance.asm and vp9_highbd_sad4d_sse2.asm functions. This patch fixed the bugs, and made 32bit build work. Change-Id: Idc8e5e1b7965bb70d4afba140c6583c5d9666b75	2015-02-05 11:24:03 -08:00
Yaowu Xu	c905c42ad8	Remove unnecessary initialization loop_filter_level is always reset in loop_filter_frame() later in encoder. Change-Id: I608e03d905a6b23e7d5025ca747e4784c665007e	2015-02-04 13:56:16 -08:00
Yaowu Xu	581aee001e	Move tx_mode decision logic into select_tx_mode() Change-Id: I7f8f78c33eb3f33344b029a27bda320f4d68c577	2015-02-04 13:54:49 -08:00
Yaowu Xu	19451e6d67	Replace repeated check with single variable Change-Id: I2f6a669bf7c6d9796388ad3f3fa3fc942635c215	2015-02-04 12:59:14 -08:00
Yaowu Xu	a844a778c7	Merge "Adjust partitioning threshold based rtc speed"	2015-02-04 12:52:03 -08:00
Yaowu Xu	3bc0c6576f	Merge "Move calls to avoid unnecessary operations"	2015-02-04 12:51:16 -08:00
hkuang	41e376e494	Mute the harmless tsan error in frame parallel decode. Change-Id: I52565fd90461221f89134997a0782cb1b681df01	2015-02-04 12:39:35 -08:00
Jingning Han	1221641914	Merge "Unify luma and chroma inter predictors in choose_partitioning"	2015-02-04 12:09:21 -08:00
Jingning Han	fb2bac4001	Merge "Save an extra call for setup_pred_plane function"	2015-02-04 12:09:14 -08:00
Jingning Han	ce819d74dc	Merge "Account for chroma component costs in RTC mode decision"	2015-02-04 12:09:01 -08:00
Yaowu Xu	bdfb5f986e	Adjust partitioning threshold based rtc speed On rtc set: speed 7 quality improves about 0.5% speed 8 quality improves about 1.0% Encoding time for speed 7 changes from 67804ms to 65889ms Encoding time for speed 8 changes from 58659ms to 56808ms Change-Id: Iabcfb53012fc1b9f3326cdbc167e5758b8c7ad30	2015-02-04 11:28:39 -08:00
hkuang	b104b84058	Fix a thread lost bug in frame parallel decode. After syncing the frame worker thread, avaiable thread count should increase by 1 even the worker thread does not have displayable frame to output. Change-Id: I9eeb87720fed82dfe38555286833ff88e8a8e746	2015-02-04 11:07:02 -08:00
Jingning Han	1b9082ec6b	Unify luma and chroma inter predictors in choose_partitioning Change-Id: I8bfc80f4fffb0892e93d3326394a52d1ee3c0f37	2015-02-04 10:02:57 -08:00
Jingning Han	4ccfc7d517	Save an extra call for setup_pred_plane function Reuse the yv12_mb array to fetch the buffer pointers/strides corresponding to the current reference frame. Change-Id: I5276b7494158b2cccef15213be2dc189e9036851	2015-02-04 09:47:14 -08:00
Jingning Han	0c6d3a03e1	Account for chroma component costs in RTC mode decision This commit allows the encoder to account for additional chroma plane costs in the mode decision process, if the current block potentially contains significant color change. It improves the visual quality at very low bit-rates. The compression performance of dark720p is improved by 12.39% in speed 6. For jimred at 150 kbps, the PSNR of V component (red) increased by 0.2 dB, at the expense of about 5% increase in encoding time. Note that for sequences where the chroma components are fairly consistent, the encoding time increase is negligible. On average the rtc set compression performance is improved by 1.172% in PSNR and 1.920% in SSIM. Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c	2015-02-04 09:45:14 -08:00
Yunqing Wang	b3b7645a2f	vp9_dthread: remove frame_parallel_decoding_mode requirement This patch continues the work to remove frame_parallel_decoding_mode requirement in VP9 multi-threaded tile decoder. In order to do that, the frame counts associated to each thread need to be accumulated together after the frame is decoded. Change-Id: Idba1a756cedfed3c154aef52ed82c8da3bbf9e0c	2015-02-04 09:16:41 -08:00
Johann	3a5d40608e	Merge "Remove unnecessary pointer check"	2015-02-03 17:12:56 -08:00
Yaowu Xu	02537ebbe4	Move calls to avoid unnecessary operations Change-Id: I236f7f75ab9a4511d1b52a6a67299b0e844a103e	2015-02-03 17:01:37 -08:00
Yaowu Xu	cb411108a3	Merge "adjust rtc setting and threshold"	2015-02-03 15:13:52 -08:00
hkuang	70554a21f1	Merge "Remove duplicate code."	2015-02-03 13:37:48 -08:00
Jim Bankoski	d7783cae95	Merge "make low bitrates a lot less blocky"	2015-02-03 13:25:06 -08:00
Johann	ba18609502	Remove unnecessary pointer check The original implementation had the following comment: // Ignore mv costing if mvsadcost is NULL However the current implementation does not allow for this. If x exists then nmvsadcost must not be null. This removes the only warning from -Wpointer-bool-conversion https://code.google.com/p/webm/issues/detail?id=894 Change-Id: I1a2cee340d7972d41e1bbbe1ec8dfbe917667085	2015-02-03 13:03:46 -08:00
Jingning Han	894f0fbd3b	Merge "Assign 2nd ref frame in choose_partitioning"	2015-02-03 12:25:18 -08:00
Jingning Han	ca9c352fc3	Assign 2nd ref frame in choose_partitioning Avoid the use of uninitialized second reference frame for fetching reference block. Change-Id: I9983a0daea829700b3270dc8bf2bcc6d6ea36652	2015-02-03 11:17:51 -08:00
Yunqing Wang	f5b3631621	Merge "vp9_dthread: pass frame counts to decoder functions"	2015-02-03 10:52:02 -08:00
Yaowu Xu	a6b3e01a27	Add mutex initialization in encoder This resolves the encoder crashes on windows. Change-Id: I159d79014cf9279751e403936ce1f84482ae82da	2015-02-03 09:53:08 -08:00
Yunqing Wang	85a9bc04d4	vp9_dthread: pass frame counts to decoder functions The current multi-threaded tile decoder requires that the videoes are encoded with frame_parallel_decoding_mode = 1. This requirement is not necessary, and is better to be removed. This patch includes the first part of the work. Change-Id: Ic7695fb3cfe13f9022582c9f0edd2aa6e2e36d28	2015-02-03 09:39:15 -08:00
Jim Bankoski	9f1cf2c8cf	make low bitrates a lot less blocky Remove loop filter skip at speed 7+ because of bad visual artifacts and up the postprocessing. Change-Id: Ibdd0bac71aaee232d2bb2e14462733c51517768d	2015-02-03 06:45:56 -08:00
Yaowu Xu	65a1a3e85d	adjust rtc setting and threshold 1. Adjusted the threshold for coef update computation based on counts of tx used, avoid coef update computation when count is low (<20) 2. Move sf->lpf_pick = LPF_PICK_MINIMAL_LPF to speed 8. Change-Id: I02b44309e40fcdbf135c7934ae067a3f42502d30	2015-02-02 17:43:46 -08:00
hkuang	4ed539f22e	Merge "Fix a bug from merging frame parallel branch into master."	2015-02-02 17:08:42 -08:00
hkuang	94a459522e	Fix a bug from merging frame parallel branch into master. The merge did not merge the fix for issue #850. Change-Id: I0dc1377dbfcb9497fb01a13d4f78ac65bff5eb33	2015-02-02 16:01:17 -08:00
Alex Converse	a79db92c07	Merge "Allow larger encoder configurations."	2015-02-02 12:05:56 -08:00
Yaowu Xu	80e729f601	Merge "Optimize coef update"	2015-02-01 20:08:29 -08:00
hkuang	be6aeadaf4	Try again to merge branch 'frame-parallel' into master branch. In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. Current frame parallel decode will only speed up the decoding for frame parallel encoded videos. For non frame parallel encoded videos, frame parallel decode is slower than serial decode due to lack of loopfilter worker thread. There are still some known issues that need to be addressed. For example: decode frame parallel videos with segmentation enabled is not right sometimes. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c This reverts commit `a18da9760a`. Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02	2015-01-30 21:00:13 -08:00
James Zern	f6c2a6c5d6	vp9: rename 'near' parameters + nearest for consistency near is a reserved word in windows builds so using it as a parameter name may cause build failures with some configurations Change-Id: Iddf1d4ecdb39843f14e95dbfd9dca55f07f81403	2015-01-30 15:52:24 -08:00
Jingning Han	f1ab5c1021	Merge "Format fixes in vp9_rd_pick_inter_mode_sb/sub8x8"	2015-01-30 15:49:14 -08:00
Yaowu Xu	45971abd1d	Optimize coef update 1. move the check of search method of USE_TX_8X8 up one level to avoid operations of build_tree_distributions() 2. count tx used and avoid computaton for coef udpate when one size is not used at all. Change-Id: Ia3e54a2588aa531c41377a1bfaa64385d04a592c	2015-01-30 10:16:40 -08:00
Yunqing Wang	3b3e299650	Merge "Fix issues in 32bit PIC enabled build"	2015-01-29 16:41:25 -08:00
Alex Converse	797a2556eb	Allow larger encoder configurations. Allow changing colorspace in the encoder and increasing frame size. Change-Id: I8e7c3b891af29ce420a15beb4f6f9c250245b2bb	2015-01-29 15:07:40 -08:00
Paul Wilkins	68340a3470	Merge "Change to update of rate control factors."	2015-01-29 13:50:52 -08:00
Marco	a80dd52b6e	Merge "Fix to vp9 denoiser."	2015-01-29 09:10:30 -08:00
Paul Wilkins	f752da8ce2	Change to update of rate control factors. Remove damping parameter and use the damping formula introduced by Yaowu Xu in all cases. Change-Id: I18db7e0d0f262d5140102f259ab07821d374d285	2015-01-28 15:44:53 -08:00
Yaowu Xu	ff99a3c750	Simplify update_coef_probs() 1. reduce the size of temporaray arrays on stack 2. avoid build_tree_distribution for tx size that is not used at all. Change-Id: I0f8d7124e16a3789d3c15ad24cf02c1c12789e2c	2015-01-28 15:12:42 -08:00
Marco	c0923d4d3a	Fix to vp9 denoiser. Prevent from using wrong mv for denoiser motion compensation. Change-Id: Ifa0f9daabdbdab0900d3c17304059fe0d15de914	2015-01-28 12:07:27 -08:00
hkuang	e8c42fb0bd	Remove duplicate code. (issue #934). Change-Id: Ic8adaaff87aae0b33d9b508f160b48e0ccdaaf4c	2015-01-28 12:00:34 -08:00
Frank Galligan	d1e6b8231a	Merge "Add vp9_sad32x32x4d_neon Neon intrinsic function."	2015-01-28 10:35:50 -08:00
Frank Galligan	eb12d880ab	Merge "Add vp9_sad16x16x4d_neon Neon intrinsic function."	2015-01-27 23:01:44 -08:00
Frank Galligan	80a3a07929	Merge "Add vp9_sad64x64x4d_neon Neon intrinsic function."	2015-01-27 23:01:15 -08:00
Yunqing Wang	10d5e09c87	Fix issues in 32bit PIC enabled build This patch was to fix issue 924: https://code.google.com/p/webm/issues/detail?id=924 The SECTION_RODATA macro was modified to support macho32 format. The sub-pixel functions were modified to pass in 2 more parameters to handle the global offsets for PIC build. Change-Id: I3bfcd336bcae945edf300bca4ab40376a2628cd4	2015-01-27 22:20:21 -08:00
Yaowu Xu	fe2439703d	Merge "move clear_system_state() call before using double"	2015-01-27 12:42:13 -08:00
Frank Galligan	e3167f7fbf	Add vp9_sad32x32x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~18% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c	2015-01-27 08:54:00 -08:00
Frank Galligan	9f574d0316	Add vp9_sad16x16x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~15% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9	2015-01-27 08:42:17 -08:00
Frank Galligan	54fa956715	Add vp9_sad64x64x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: Id12af7d1883243c23e6692e898aea82299633d58	2015-01-27 08:33:40 -08:00
Marco	1c4a84c6e9	Merge "aq-mode=3: Update to allow for refresh on modes other than zero-mv."	2015-01-26 19:47:13 -08:00
Yaowu Xu	645b7cdf03	move clear_system_state() call before using double Floating point is used in vp9_convert_qindex_to_q(), so sometime unit test ActiveMapTest would cause run time error without properly call to clear_system_state to reset register status. Change-Id: I181e9395148c44a6ca8b97d6e109bd4a152143c6	2015-01-26 18:41:50 -08:00
Paul Wilkins	d231ce4fde	Merge "Adjust active maxq for GF groups."	2015-01-26 18:19:09 -08:00
Yaowu Xu	d987dc4fdb	Merge "Fix MSVC warnings on conversion from int64 to int"	2015-01-26 16:52:30 -08:00
Marco	3f1af6e85e	aq-mode=3: Update to allow for refresh on modes other than zero-mv. Add distortion threshold condition to refresh state of a coding block, and allow for qp adjustment also for some intra modes and non-zero motion modes. Also some code cleanup (remove unused variables/code). Change-Id: I735fa2b28bc64f60e0323976b82510577b074203	2015-01-26 16:44:25 -08:00
Paul Wilkins	fd070220ff	Adjust active maxq for GF groups. Currently disabled by default: enabled using #define GROUP_ADAPTIVE_MAXQ In this patch the active max Q is adjusted for each GF group based on the vbr bit allocation and raw first pass group error. This will tend to give a lower q for easy sections and a higher value for very hard sections. As such it is expected to improve quality in some of the easier sections where quality issues have been reported. This change tends to hurt overall psnr but help average psnr. SSIM also shows a small gain. Average results for derf, yt, std-hd and yt-hd test sets were as follows (%change for average psnr, overal psnr and ssim):- derf +0.291, - 0.252, -0.021 yt +6.466, -1.436, +0.552 std-hd +0.490, +0.014, +0.380 yt-hd +5.565, - 1.573, +0.099 Change-Id: Icc015499cebbf2a45054a05e8e31f3dfb43f944a	2015-01-26 14:55:36 -08:00
Yaowu Xu	6d16f6c14c	Fix MSVC warnings on conversion from int64 to int Change-Id: I7e96509ffa36899fcd2935749927a1e8aac8d025	2015-01-26 10:54:06 -08:00
Frank Galligan	9f6eba419a	Add Neon intrinsic vp9_fdct8x8_quant_neon On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51	2015-01-24 22:49:50 -08:00
Yaowu Xu	643c75d90b	Merge "Replace divide with look-up"	2015-01-23 21:12:18 -08:00
Jingning Han	9bdc0ae2b2	Format fixes in vp9_rd_pick_inter_mode_sb/sub8x8 Add parentheses to bit operations. Change-Id: I095d601f0631d055adc4b3a8fde70c9cbae9e749	2015-01-23 11:48:58 -08:00
JackyChen	65f60f8e8c	Merge "SSE2 code for the filter in MFQE."	2015-01-23 11:08:16 -08:00
Adrian Grange	0e2e2c2652	Merge "Remove elevate_newmv_thresh from SPEED_FEATURES (unused)"	2015-01-23 09:57:03 -08:00
Yaowu Xu	eda179764f	Replace divide with look-up This commit replaces an integer divide with a table-lookup. It is to improve decoding speed, and at the same time, to reduce possible complications with a bug in AMD Family 12h processors: "665 Integer Divide Instruction May Cause Unpredictable Behavior" Change-Id: I678b707a538798a923850bac467e66e847e6def7	2015-01-23 09:02:07 -08:00
Johann	a18da9760a	Revert "Merge branch 'frame-parallel' to enable frame parallel decode in master branch." This reverts commit `bde04ce503` Change-Id: I053dae04c761b04a36dc239558503905a14d2470	2015-01-23 08:42:02 -08:00
hkuang	bde04ce503	Merge branch 'frame-parallel' to enable frame parallel decode in master branch. In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. VP9 frame parallel decode is >30% faster than serial decode with tile parallel threading which will makes devices play 1080P VP9 videos more easily. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64	2015-01-22 18:18:53 -08:00
Adrian Grange	527e073163	Remove elevate_newmv_thresh from SPEED_FEATURES (unused) Change-Id: I78ef7f89586a329787f6bc4c58ec83af210989a3	2015-01-22 16:12:50 -08:00
Marco	0dccb6277c	Modify variance partition selection for low resolutions. For low spatial resolutions: bias partittion selection to smaller block sizes, and base the variance computation on 4x4 down-sampling. Also move the threshold computations into the choose_partitioning, so they are computed once for each sb block. On low-res clips (RTC_derf) PSNR/SSIMetrics increase by about 4-5%. No change for resolutions above CIF. Change-Id: I93f8ff742c8044786977bb6e31dcf8efda6dd1b0	2015-01-22 15:16:55 -08:00
Paul Wilkins	cf3202132f	Merge "Bug when last group before forced key frame is short."	2015-01-22 08:28:19 -08:00
Paul Wilkins	0bff1efc2b	Bug when last group before forced key frame is short. Just before a forced key frame we often get a foreshortened arf/gf group. In such a case, we do not want to update rc->last_boosted_qindex, which is used to define the Q range for the forced key frame itself. This gives a small average metrics gain for the YT and YT-HD sets (eg. YT SSIM +0.141%). Change-Id: Ie06698bc4f249e87183b8f8fb27ff8f3fde216d9	2015-01-21 15:25:57 -08:00
JackyChen	cd0830f452	Merge "Fix compile error in Chromium building."	2015-01-21 14:52:32 -08:00
JackyChen	25a19b48ff	Fix compile error in Chromium building. The comparison of address in the condition is not necessary, since they will constantly be non-null. Change-Id: Id0b0075283f5af65215d5761a8160a4cb2a15c9b	2015-01-21 12:59:25 -08:00
Alex Converse	910ca857df	Allow external resize via vpx_codec_enc_config_set Change-Id: I3d324e2baa4de2d266c5f7ca7b635b62372e90a7	2015-01-21 11:33:06 -08:00
Yaowu Xu	c97d435243	Merge "Replace "colorspace" with "color_space""	2015-01-21 08:58:09 -08:00
Frank Galligan	469ff48d7b	Merge "Add Neon intrinsics for vp9_avg_8x8_neon"	2015-01-20 14:38:39 -08:00
Yunqing Wang	6d7b7abf52	Add non420 code in multi-threaded loopfilter Added non420 part back to make it consistent with single thread code in vp9_loopfilter.c. Change-Id: I8ca255d73bffebae294d2627d6655eafe535cb90	2015-01-20 09:31:47 -08:00
Yunqing Wang	7b232717af	Merge "vp9_ethread: add parallel loopfilter"	2015-01-20 09:27:08 -08:00
JackyChen	09673deba9	SSE2 code for the filter in MFQE. The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8 side. In our testing, we achieve 2X speed by adopting this change. Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716	2015-01-18 16:07:59 -08:00
Frank Galligan	cc2da09d42	Fix variance Neon intrinsics > 32x32 The 16 bit sum vector was overflowing. Change-Id: I0fdf38e832ee99457ec8680a92691a6175ff8c3f	2015-01-17 10:31:48 -08:00
Yunqing Wang	e76eaf05b1	vp9_ethread: add parallel loopfilter 1. Added row-based loopfilter in encoder; 2. Moved common multi-threaded loopfilter functions from decoder to common; 3. Merged multi-threaded loopfilter code, and made encoder/ decoder call same function to reduce code duplication. Encoder tests showed that 1% - 2% speedup was seen for good-quality 2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6% speedup using 4 threads were seen for real-time mode(at speed 7). Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4	2015-01-16 17:19:27 -08:00
Jingning Han	0220255fa0	Merge "Fix frame buffer swap in denoiser"	2015-01-16 16:58:37 -08:00
Jingning Han	dfda5cebc7	Fix frame buffer swap in denoiser This commit fixes a bug in denoiser reference frame buffer swap, which disables frame buffer update. Change-Id: I39a9427180fd18f9692602064ad821f7af4714c0	2015-01-16 12:29:58 -08:00
Yaowu Xu	bc5d3fae5c	Replace "colorspace" with "color_space" This is to make the usage of the variable name consistent across the code base. Change-Id: I698739e55841c59358d1c6e5cc97c96088772943	2015-01-15 17:58:47 -08:00
Minghai Shang	220bc3a013	[two pass temporal svc]Fix crash issue in transcoder app caused by last fix. Change-Id: I78ecc8ec3fa3ba5f69bb23813e68a5255d0534e1	2015-01-15 16:59:54 -08:00
Frank Galligan	6e7e1cf32f	Add Neon intrinsics for vp9_avg_8x8_neon On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee	2015-01-15 15:32:40 -08:00
Yunqing Wang	99b99831e4	Align thread data in vp9_ethread On some platforms, such as 32bit Windows and 32bit Mac, the allocated memory isn't aligned automatically. The thread data is aligned to ensure the correct access in SIMD code. Change-Id: I1108c145fe982ddbd3d9324952758297120e4806	2015-01-14 15:51:56 -08:00
Yaowu Xu	829a01dbb7	Merge "Add encoder control for setting color space"	2015-01-14 14:14:34 -08:00
Frank Galligan	c7d6c0c5a8	Merge "Switch remaining Neon variance functions to shifts"	2015-01-14 12:17:42 -08:00
Frank Galligan	68224a6e87	Merge "Add 64x64 sub_pel_variance Neon function"	2015-01-14 12:17:20 -08:00
Yaowu Xu	e94b415c34	Add encoder control for setting color space This commit adds encoder side control for vp9 to set color space info in the output compressed bitstream. It also amends the "vp9_encoder_params_get_to_decoder" test to verify the correct color space information is passed from the encoder end to decoder end. Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650	2015-01-14 10:17:14 -08:00
Yaowu Xu	afae733eed	Merge "Enable decoder to pass through color space info"	2015-01-14 10:04:15 -08:00
Frank Galligan	ec1d8387e1	Add 64x64 sub_pel_variance Neon function On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334	2015-01-14 08:36:24 -08:00
Frank Galligan	588f74f8a6	Switch remaining Neon variance functions to shifts Saves 5 instructions on 8x8 and 16x16 and 8 instructions on 32x32, when compiled with 4.9. Change-Id: Id3da613a36a9d27d8c5169c59ba45d247c920c6c	2015-01-14 07:22:49 -08:00
Frank Galligan	bd3dbc588c	Merge "Add 64x variance Neon functions"	2015-01-13 22:38:58 -08:00
Minghai Shang	a14415d171	[twopass temporal svc] Fix decoding error on seek. Don't put small empty frame in front of a key frame. We will put key frame flag in webm container if there's a visible key frame. But there will be decoding error when we seek to here if we put the small empty frame, which will be inter frame, in front of it. Change-Id: Id50c2c1fd31da0405ff6faa7375cc2f49c55402d	2015-01-13 15:44:22 -08:00
Yaowu Xu	6b223fcb58	Enable decoder to pass through color space info This commit added a field to vpx_image_t for indicating color space, the field is also added to YUV_BUFFER_CONFIG. This allows the color space information pass through the decoder from input stream to the output buffer. The commit also updated compare_img() function with added verification of matching color space to ensure the color space information to be correctly passed from encode to decoder in compressed vp9 streams. Change-Id: I412776ec83defd8a09d76759aeb057b8fa690371	2015-01-13 15:13:19 -08:00
Frank Galligan	74d40cd507	Add 64x variance Neon functions Add optimized Neon functions of: vp9_variance32x64 vp9_variance64x32 vp9_variance64x64 On Nexus 7 speed -5 and -6 saw about a 4% increase in perf. Speeds -7 and -8 saw about a 6% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa	2015-01-13 15:08:13 -08:00
Yaowu Xu	6f6fbf9175	Merge "Added plumbing for setting color space"	2015-01-13 09:20:13 -08:00
Yaowu Xu	fe3f21099f	Merge "Fix comments and color format"	2015-01-11 14:01:36 -08:00
Yaowu Xu	ce52b0f8d3	Added plumbing for setting color space Change-Id: If64052cc6e404abc8a64a889f42930d14fad21d3	2015-01-09 10:54:25 -08:00
Yaowu Xu	ecbca31a1d	Fix comments and color format Replaced "color space" with "color format" in comments where color sampling format is concerned, so to differentiate from the concept defined in COLOR_SPACE. Change-Id: I8c935034c166b24307a99352dab1686531276bb8	2015-01-09 10:36:43 -08:00
Paul Wilkins	ccffe318ff	Merge "Use 64 bit to accumulate frame sse."	2015-01-09 06:05:11 -08:00
Jingning Han	ae537c151b	Merge "Refactor mc reference block fetch in denoiser"	2015-01-08 17:56:53 -08:00
James Zern	4d6838627d	Merge "vp9: add per-tile longjmp error handling"	2015-01-08 15:53:37 -08:00
James Zern	44b55dada8	Merge "vp9: fix -Wclobbered (longjmp + local variables)"	2015-01-08 15:53:02 -08:00
Jingning Han	a1daf009be	Merge "Use lookup table to find pixel numbers in block"	2015-01-08 13:58:35 -08:00
Johann	00bbe342c2	Merge "Disable vp9 _8_ loopfilters"	2015-01-08 12:47:52 -08:00
Jingning Han	a0be730eae	Refactor mc reference block fetch in denoiser This commit refactors the motion compensated reference block fetch process in denoiser. It skips the stage that generates motion compensated reference block if denoiser decides to use copy block mode. For high motion clips, this could speed up the denoising process by about 10%. Change-Id: I8ef4fa5fe766a8c4529119b9ec01faefb3d4ef53	2015-01-08 12:43:08 -08:00
Jingning Han	e3f0b19f3f	Use lookup table to find pixel numbers in block This could save one multiplication in each threshold funtion called by the denoiser per block. Change-Id: I35f437e09999f0a087180878ef7805f0d86e5819	2015-01-08 12:32:28 -08:00
Jingning Han	e535ad5067	Merge "Refactor denoiser frame buffer update"	2015-01-08 11:16:14 -08:00
Jingning Han	97dc782635	Merge "Initalize zeromv_sse and newmv_sse in vp9_pick_inter_mode"	2015-01-08 10:55:03 -08:00
Jingning Han	f1866a5792	Merge "Use vp9_convolve_copy in denoiser output"	2015-01-08 09:59:10 -08:00
hkuang	9130e7ad2e	Merge "Remove unnecessary init_macroblockd."	2015-01-08 09:15:32 -08:00
Jingning Han	ea061a885d	Refactor denoiser frame buffer update Use frame buffer pointer swap instead of memcpy when possible. These two CLs make the denoiser when running on vidyo1 720p at speed -6 over 10% faster. Change-Id: I64fe8a2422cafca6787a50c7f4dfb961191c0a9d	2015-01-07 18:33:13 -08:00
Jingning Han	29a5deb40c	Use vp9_convolve_copy in denoiser output Replace copy_block with vp9_convolve_copy for speed performance improvement. Change-Id: I3a08c4d01dff2253b6ee573efd02f65ccdc1b5a5	2015-01-07 18:23:17 -08:00
Zoe Liu	4cf636a60e	Removed redundant local variables in the forward hybrid transforms. Change-Id: I60f7ccbbc8dc624134e325bdce6042bc183075b6	2015-01-07 16:38:29 -08:00
Yaowu Xu	01eec75858	Merge "Refactor calculation of tile_cols"	2015-01-07 16:24:57 -08:00
Jingning Han	08055b639a	Merge "Always check and free denoiser buffer memory space"	2015-01-07 15:54:06 -08:00
Jingning Han	e42b3ee765	Initalize zeromv_sse and newmv_sse in vp9_pick_inter_mode These two parameters are used to control the denoiser cut-off thresholds. They should be properly initialized when starting mode search of a given block. Change-Id: Iba8a25487026a0dbe0d350c347d7e4e4e237b637	2015-01-07 15:32:41 -08:00
JackyChen	1883c940b9	Merge "Use qdiff to adjust the threshold of sad and variance in MFQE."	2015-01-07 14:57:46 -08:00
Yaowu Xu	e9cf9b7dfe	Refactor calculation of tile_cols Change-Id: I2c38ea2bcf6d221a0b6b2fb9be4cebbee21006a3	2015-01-07 14:28:59 -08:00
Jingning Han	b208439b5a	Merge "Fix best ref frame rd cost update in sub8x8 non-RD mode search"	2015-01-07 14:06:55 -08:00
Jingning Han	3e41563f33	Merge "Format fix in vp9_pick_inter_mode_sub8x8"	2015-01-07 14:06:06 -08:00
Jingning Han	802b798f67	Fix best ref frame rd cost update in sub8x8 non-RD mode search This fixes the issue that sub8x8 inter blocks always end up with GOLDEN_FRAME. Change-Id: Id0c25cbb9c2003f43b4dff8fb1572512c246e077	2015-01-07 12:00:02 -08:00
Jingning Han	c3fd9bbdaf	Format fix in vp9_pick_inter_mode_sub8x8 Replace ref_frame++ with ++ref_frame. Change-Id: Ic39793081156c314bf1b85d5ab76def97f3bff52	2015-01-07 11:50:36 -08:00
Jingning Han	59f29f5e3f	Merge "Fix denoiser chroma component initialization"	2015-01-07 11:30:15 -08:00
Jingning Han	9a0e694182	Merge "Skip duplicate denoiser frame buffer allocation"	2015-01-07 11:30:07 -08:00
Johann	d12f1f907d	Merge "Rearrange loopfilter functions"	2015-01-07 11:07:54 -08:00
Deb Mukherjee	ba93f0a201	Merge "Moves inter mode count updates to update_stats"	2015-01-07 09:38:32 -08:00
JackyChen	60cf5cf7b2	Use qdiff to adjust the threshold of sad and variance in MFQE. When qdiff is larger, the sad/variance threshold should also be higher which indicates a more aggressive action on MFQE. Change-Id: I44c5c93572805458d4f87fdc7619cc9d8a522185	2015-01-07 09:07:10 -08:00
Jingning Han	ce08006951	Always check and free denoiser buffer memory space The vp9_denoiser_free() function will internally check if the buffer pointers are NULL. This commit makes the encoder always call vp9_denoiser_free() after finishing encoding. It protects the case where noise_sensitivity_level is changed during encoding process and happen to be turned off towards the end of sequence, which could result memory space allocated to denoiser not being released. Change-Id: Ie20dc2f2e6e5fb6333fbab3356bc153978a6a0f8	2015-01-07 08:50:13 -08:00
Jingning Han	2fb9b635bb	Fix denoiser chroma component initialization Use the correct frame size and stride value for chroma components when setting the initial values. These control parameters are assigned when the denoiser buffer was allocated and initialized. Change-Id: Ia6318194c7738aff540bcbd34f77f0dac46221a1	2015-01-07 08:49:59 -08:00
Jingning Han	27582e573b	Skip duplicate denoiser frame buffer allocation Allocate the frame buffer allocation for denoiser once during the encoder initialization. This avoids allocating frame buffer multiple times and overwriting the buffer pointer without proper releasing. Change-Id: I9b3baa6283449d86fd164534d344c036bb035700	2015-01-07 08:49:04 -08:00
Paul Wilkins	a3c1a9b419	Use 64 bit to accumulate frame sse. When testing frame sse to choose a loop filter value and when checking ambient error in kf Q selection, use 64 bit values for accumulating the sse, to avoid risk of overflow for large image formats. Change-Id: I03765d16c843d0ade61a45b0cd46312472697e57	2015-01-07 14:13:16 +00:00
Johann	377b6682f9	Disable vp9 _8_ loopfilters Investigating https://code.google.com/p/chromium/issues/detail?id=443839 Change-Id: Ibb7485d835c5aa5e1d40f31715596ba8d208eedb	2015-01-06 19:26:11 -08:00
Johann	b1ba4cc394	Rearrange loopfilter functions Separate functions and rename files. This will make it easier to disable some functions later to help work around a compiler issue in chromium. Change-Id: I7f30e109f77c4cd22e2eda7bd006672f090c1dc5	2015-01-06 19:26:11 -08:00
Yaowu Xu	7ba6a676f5	Merge "Use -1 consistently as invalid buffer idx"	2015-01-06 17:31:13 -08:00
Deb Mukherjee	e7570493b8	Moves inter mode count updates to update_stats This makes the inter_mode counts update consistent with other symbols. Also, forward updates should work corerctly now. Change-Id: Id98be26fd08875162e644bb8f1de6f0918f85396	2015-01-06 16:40:45 -08:00
Yaowu Xu	61c5e94e22	Use -1 consistently as invalid buffer idx Instead of mixed use of both -1 and INT_MAX. This also fixes a vp9 fuzzing test failure. Change-Id: I950ea94b44ec7cdb5232773bee30b104e342f52a	2015-01-06 15:59:03 -08:00
Deb Mukherjee	0c2ee67ad6	Merge "Enable coefficient range checking for 10-/12-bit"	2015-01-06 14:59:08 -08:00
Yaowu Xu	0979dbb37b	Merge "Fix compiler warnigns for msvc2013"	2015-01-06 08:01:47 -08:00
Yaowu Xu	262f66e6ed	Merge "Properly validate data size"	2015-01-06 08:01:30 -08:00
Paul Wilkins	a88e4e64b1	Merge "Deleted unused #define"	2015-01-06 04:18:20 -08:00
Deb Mukherjee	0ce2a27e9b	Enable coefficient range checking for 10-/12-bit Also fixes a broken build with --enable-coefficient-range-checking configuration option. Change-Id: Icc536f53088e8cec59dfb8f635668555fdb9125e	2015-01-06 02:40:51 -08:00
Yaowu Xu	9c061ef506	Properly validate data size With "show_existing_frame" frames: Minimum data size for profile 0 and 1 is 1 byte (8bits) Minimum data size for profile 2 and 3 is 2 bytes (9bits) Otherwise: Minimum data size is 8 bytes. This resolves the VP9 failure in fuzzing test build #56. Change-Id: I146d9d37688f535dd68d24aacc76d464ccffdf04	2015-01-05 17:34:31 -08:00
Yaowu Xu	364b92dc88	Fix compiler warnigns for msvc2013 Change-Id: I1e32bf8f6872a6fb7e9cabe86483e94805e2f790	2015-01-05 17:31:19 -08:00
Jingning Han	1da0402eff	Merge "Fix denoised video output function"	2015-01-05 15:28:29 -08:00
JackyChen	fe23539d58	Adopt weighted averaging in MFQE. By using weighted averaging in the calculation of the frames to be displayed, we get an average gain of more than 1 db for key frames whose base qp are 20 higher than non-key frames. Change-Id: I7bcb2e7b9c6420ea3f73f33204d18b072dffd17c	2015-01-05 11:38:42 -08:00
Jingning Han	21c0306187	Fix denoised video output function This commit fixes the buffer alignment control in denoised video output function. The encoder is now able to properly store the denoised input video into provided file when enabled. Change-Id: I258e272c8d4a9b52592e16d6d09976c6f5c21728	2015-01-03 21:39:32 -08:00
Jingning Han	2fe1bfa5ad	Merge "Remove redundant local variable for segment_id"	2015-01-02 14:48:27 -08:00
Jingning Han	5516fdd8d0	Remove redundant local variable for segment_id Use mbmi->segment_id directly in vp9_pick_inter_mode. The value is set outside this function, hence no need to assign it again. Change-Id: I3d63cdd2e4fadf62ccdefada638b00d979eb3741	2015-01-02 12:25:14 -08:00
Jingning Han	0d2d3321af	Merge "Add bsize check condition in nonrd_use_partition"	2015-01-02 11:50:57 -08:00
Jingning Han	5486db185c	Add bsize check condition in nonrd_use_partition Check if block size is below 8x8 for rectangular block coding. It is added to support 4x8 and 8x4 block coding for RTC mode. Change-Id: I760b328f45b98ae48adc45ed5a39fb643cd8aebd	2015-01-02 10:12:37 -08:00
Jingning Han	59cfaa538e	Merge "Use less tmp motion vectors in vp9_pick_inter_mode_sub8x8"	2015-01-02 10:00:45 -08:00
Jingning Han	5c31fd5c6d	Merge "Enable sub8x8 inter block search for RTC coding mode"	2015-01-02 10:00:35 -08:00
hkuang	aa5563cd41	Remove unnecessary init_macroblockd. macroblockd are init again inside decode_tiles and decode_tiles_mt. Change-Id: I1f42837864f095c319cdb24cec7d6aa6a3a4da50	2014-12-30 15:23:52 -08:00
Jingning Han	2baccb18a0	Use less tmp motion vectors in vp9_pick_inter_mode_sub8x8 This commit simplifies the reference motion vector part for sub8x8 block coding in RTC mode and reduces the required local variables. Change-Id: I470d1482092563b68af22404dc1f497e7457b0a8	2014-12-30 13:16:12 -08:00
Jingning Han	f5d574c566	Merge "Set ref frame scaling factor in RTC inter mode decision"	2014-12-29 14:20:22 -08:00
Jingning Han	dad89d5ca1	Enable sub8x8 inter block search for RTC coding mode This commit enables sub8x8 inter block coding for RTC mode. The use of sub8x8 blocks can be turned on by allowing choose_partitioning function to select 4x4/4x8/8x4 block sizes. Change-Id: Ifbf1fb3888fe4c094fc85158ac3aa89867d8494a	2014-12-24 17:40:31 -08:00
Jim Bankoski	b3c66f8a2f	WIP: Remove giant value cost table Change-Id: Iabe8a8868a747626c24bb13f1796f4c7827af367	2014-12-23 15:06:17 -08:00
Jingning Han	eb1795f643	Set ref frame scaling factor in RTC inter mode decision Properly set the corresponding scaling factor of the reference frame in the non-RD mode decision process. This allows the mode search process to account for the scaled reference frame when selecting coding mode. Change-Id: I9d41bff6931c98e5a82b413e37ac5e6e14b93b23	2014-12-23 09:33:58 -08:00
James Zern	59d63e610a	vp9: fix -Wclobbered (longjmp + local variables) Local variables used at the setjmp() site need to be marked volatile. Relevant excerpt from the 'man longjmp': =============== The values of automatic variables are unspecified after a call to longjmp() if they meet all the following criteria: · they are local to the function that made the corresponding setjmp(3) call; · their values are changed between the calls to setjmp(3) and longjmp(); and · they are not declared as volatile. =============== Change-Id: I093e6eeeedbf5f781d202248ca701ba2c29d3064	2014-12-23 11:44:11 -05:00
Jim Bankoski	4e04fa6dea	Merge "make vp9_coef_encodings const"	2014-12-22 15:05:25 -08:00
Jim Bankoski	fc954c7c03	Merge "remove static initializers for partition tree"	2014-12-22 13:49:57 -08:00
Jim Bankoski	d6d431c476	Merge "Revert "Revert "Removal of legacy zbin_extra / zbin_oq_value."""	2014-12-22 13:43:56 -08:00
Jim Bankoski	fba0ead543	Merge "Tokenization without huge tables."	2014-12-22 13:36:38 -08:00
Jim Bankoski	a5f7d78a06	make vp9_coef_encodings const Change-Id: I28a3d342a4a4b23e02a0f47bb8037c4403f71d61	2014-12-22 13:35:56 -08:00
Jingning Han	d0f2377027	Revert "Revert "Removal of legacy zbin_extra / zbin_oq_value."" This reverts commit `9946ee23e0`. Fix the ssse3 asm function. Change-Id: I07f77a63aa98087626e45c4e87aa5dcafc0b0b07	2014-12-22 10:09:25 -08:00
Jim Bankoski	4b8c6d96ec	Tokenization without huge tables. Change-Id: Iff528c4b7528cc70320343b3a7ce07a92b024dfd	2014-12-22 08:42:52 -08:00
Jim Bankoski	17ee87b46c	convert extra bit cat structure to const statics Change-Id: Idb257e78dab2339ab1f41c3c82e537bc23e90b65	2014-12-22 06:57:50 -08:00
Jim Bankoski	3d94b9bf24	Merge "resolve visual studio warnings around initializers"	2014-12-19 15:18:04 -08:00
Jim Bankoski	dd4275e498	resolve visual studio warnings around initializers Change-Id: Id2ad4fb24242f7ca8fa7a152f0889fded4113613	2014-12-19 12:38:25 -08:00
James Zern	953dd1894d	vp9: add per-tile longjmp error handling this avoids longjmp'ing from another thread on error which will cause undesired behavior Change-Id: Ic9074ed8cc4243944bf2539d6e482f213f4e8c86	2014-12-19 11:50:04 -08:00
Jingning Han	1b5d612b5d	Merge "Add a guard on intra mode skip control for RTC mode"	2014-12-19 11:03:00 -08:00
Jingning Han	9c93307c10	Merge "Remove ARF mode entries from THR_MODES array in non-RD mode"	2014-12-19 11:02:51 -08:00
Jingning Han	cb01baa0fa	Merge "Rework mode search threshold update for RTC coding mode"	2014-12-19 11:02:40 -08:00
Jingning Han	a8e6d4d041	Merge "Properly store the tx_size of selected intra mode"	2014-12-19 11:02:37 -08:00
Paul Wilkins	9946ee23e0	Revert "Removal of legacy zbin_extra / zbin_oq_value." This reverts commit `e9b586e21b`. Change-Id: I5b36e6727da6c05278d97e2c37b80c109f79bed4	2014-12-19 15:02:58 +00:00
Paul Wilkins	8ac3f9adaa	Merge "Removal of legacy zbin_extra / zbin_oq_value."	2014-12-19 03:37:02 -08:00
James Zern	b32ba09d35	Merge "make vp9 encoder static initializers thread safe"	2014-12-18 18:48:30 -08:00
Jim Bankoski	cd60930814	make vp9 encoder static initializers thread safe Change-Id: If2d0888d13ebe52bc7c3b16f16319408a86ab6de	2014-12-18 15:50:46 -08:00
Jingning Han	6ec0ef6691	Add a guard on intra mode skip control for RTC mode This commit adds a guard condition to the intra mode test skip control in RTC coding mode. If all inter modes are skipped, force the encoder to check intra mode. It avoids situations where the encoder processes without properly assigning required mode information. Change-Id: Ibb349fee997d6584ce901d08b06e8df3ca9c01b1	2014-12-18 12:00:27 -08:00
Paul Wilkins	e9b586e21b	Removal of legacy zbin_extra / zbin_oq_value. zbin extra / zbin_oq_value was widely passed around, hence removal touches a lot of code. Change-Id: Idc94359735b60c38a160e4385ae09d5ca8b6b8e5	2014-12-18 16:49:11 +00:00
Paul Wilkins	60e9b731cf	Remove mode dependent zbin boost. Initial patch to remove get_zbin_mode_boost() and cpi->zbin_mode_boost. For now sets a dummy value of 0 for zbin extra pending a further clean up patch. Change-Id: I64a1e1eca2d39baa8ffb0871b515a0be05c9a6af	2014-12-18 16:45:52 +00:00
Paul Wilkins	2e39817f5e	Merge "Improve motion detection for low complexity regions."	2014-12-18 08:38:21 -08:00
hkuang	b7166143d0	Let YUV plane share the same dqcoeff buffer. Remove unnecessary dqcoeff from macroblockd which reduce macroblockd size by 16384 bytes. Change-Id: Ia379a703b4fee81c8fd4698b52488a85a90c9bc2	2014-12-17 18:29:07 -08:00
Jingning Han	dd0602e01c	Remove ARF mode entries from THR_MODES array in non-RD mode The alternate reference frame is disabled in non-RD mode. No need to keep the related entries in the THR_MODES array. Change-Id: I53386f4bb1c6284f582801f27246c5edf55bc24b	2014-12-17 17:13:15 -08:00
Yaowu Xu	09b9a59fb5	Merge "Corrected value range of --cpu-used for vp9"	2014-12-17 17:12:14 -08:00
Jingning Han	455514a683	Rework mode search threshold update for RTC coding mode In RTC coding mode, the alternate reference frame modes and compound inter prediction modes are disabled. This commit reworks the related mode search threshold update process to skip interacting with these coding modes. It provides about 1.5% speed-up for speed -6 on average. vidyo1 16551 b/f, 40.451 dB, 6261 ms -> 16550 b/f, 40.459 dB, 6190 ms nik720p 33316 b/f, 38.795 dB, 6335 ms -> 33310 b/f, 38.798 dB, 6237 ms mmmoving 33265 b/f, 41.055 dB, 7176 ms -> 33267 b/f, 41.064 dB, 7084 ms dark720 33329 b/f, 39.729 dB, 11235 ms -> 33331 b/f, 39.733 dB, 10731 ms Change-Id: If2a4090a371cd28f579be219c013b972d7d9b97f	2014-12-17 15:56:01 -08:00
Yaowu Xu	a16f075375	Corrected value range of --cpu-used for vp9 This commit removes undefined value options of cpu-used for VP9 and changed vpxenc prompt to reflect the usable range of [-8,8] Change-Id: Ib80fef3dbb6ec9aabac45ed13e8ab6fbaf94f55e	2014-12-17 15:18:01 -08:00
JackyChen	9bc7974552	Merge "Add rectangle block support for MFQE."	2014-12-17 15:10:02 -08:00
Jim Bankoski	fd96deb06c	remove static initializers for partition tree Could have problem with 2 encoders. Change-Id: I92d326933c00fee688f77b54acf467ca5a8516bc see: https://code.google.com/p/webm/issues/detail?id=900&thanks=900&ts=1418843841	2014-12-17 11:41:06 -08:00
JackyChen	021e244a51	Merge "Use bit_depth in VP9Common as the flag of highbit."	2014-12-17 09:30:32 -08:00
Jingning Han	56a8bc54a6	Properly store the tx_size of selected intra mode Use a temporary variable to store the transform size associated with the best intra mode and restore the mode_info if the overall best mode is intra mode. Change-Id: I2606e0061ad32f91b095462902b1eb734b128eea	2014-12-17 09:25:14 -08:00
Jingning Han	00d2211929	Merge "Remove reset mode_info array per frame"	2014-12-17 09:24:44 -08:00
Jingning Han	cc8a11d8a1	Merge "Set second ref frame to be NONE in key frame coding"	2014-12-17 09:24:39 -08:00
Paul Wilkins	b76312124d	Deleted unused #define FAST_MOTION_MV_THRESH no longer referenced. Change-Id: Idee6ee5a59ba330904c42b20c9ec35b6fc16f7a2	2014-12-17 14:59:22 +00:00
JackyChen	b363cedcd1	Use bit_depth in VP9Common as the flag of highbit. Change-Id: I881aefbe68f9c10bb4629a2a5ee1e42a225d5ab7	2014-12-16 21:45:01 -08:00
James Yu	aeeaa67987	VP9 common for ARMv8 by using NEON intrinsics 15 Re-write - vp9_lpf_horizontal_4_dual_neon in vp9_loopfilter_16_neon.c Change-Id: Ie14f63d352f9564ad01db3939a61d91cf6d21a31 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 20:00:26 -08:00
Johann	ebc1951c7c	Merge "Use defines for inline and __builtin_prefetch"	2014-12-16 18:04:04 -08:00
Jingning Han	200d93545e	Merge "Fix intra mode update process in vp9_pick_inter_mode"	2014-12-16 17:04:04 -08:00
JackyChen	9931070094	Add rectangle block support for MFQE. Only for the rectangle blocks larger than 16X16, SAD and Variance are still based on the internal square blocks. Change-Id: I3754da1b0254147313f86a0140dbf4f980f06a5a	2014-12-16 16:35:54 -08:00
Johann	4f7060a431	Merge "VP9 common for ARMv8 by using NEON intrinsics 16"	2014-12-16 16:15:48 -08:00
Jingning Han	ccdc448b70	Remove reset mode_info array per frame The mode_info array was unnecessarily reset to zero every frame when error resilient mode turned on, given that the mode info values per block will be assigned during mode search stage. This commit removes this reset operation. It reduces the runtime cost on memset operation to 1/3. The overall speed -6 runtime is reduced by 2%. Change-Id: I32ecb73338d8995cc0c5147de09357364f13d45b	2014-12-16 15:54:24 -08:00
Jingning Han	01613aa753	Set second ref frame to be NONE in key frame coding This commit explicitly set the second reference frame type to be NONE in key frame coding mode. This fixes a subtle dependency of reference motion vector used by next inter frame on mode_info reset before key frame coding. Change-Id: I5ff0359753fdc9992b0bfe889490f7a32d7d5f6a	2014-12-16 15:49:58 -08:00
Johann	2fdbf70d40	Use defines for inline and __builtin_prefetch These were established for compatibility. Make sure to use them. Most frequently they manifest as issues on Visual Studio builds. Change-Id: I39d764d2eb341b999d7a6132cb44b2acfc511160	2014-12-16 15:21:19 -08:00
Frank Galligan	5fdd0f1fe0	Merge "Revert "Revert "Add support for setting byte alignment."""	2014-12-16 15:14:17 -08:00
James Yu	aa8dd897c1	VP9 common for ARMv8 by using NEON intrinsics 16 Add vp9_reconintra_neon.c - vp9_v_predictor_4x4_neon - vp9_v_predictor_8x8_neon - vp9_v_predictor_16x16_neon - vp9_v_predictor_32x32_neon - vp9_h_predictor_4x4_neon - vp9_h_predictor_8x8_neon - vp9_h_predictor_16x16_neon - vp9_h_predictor_32x32_neon - vp9_tm_predictor_4x4_neon - vp9_tm_predictor_8x8_neon - vp9_tm_predictor_16x16_neon - vp9_tm_predictor_32x32_neon Change-Id: Ib5d54a4766a1b5127169045659974f33aa98376d Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 12:57:52 -08:00
James Yu	ba05a4c640	VP9 common for ARMv8 by using NEON intrinsics 19 Delete vp9_dc_only_idct_add_neon.c The function was merged with vp9_short_idct4x4_1_add (later vp9_idct4x4_1_add) in `d2de1ca` and should have been deleted then. Change-Id: Ie58ba3dd9dc7330a8f1238dd7dd71c9ed4639b94 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 11:14:12 -08:00
JackyChen	603cdcfce5	Merge "Fixed MFQE crash issue for highbit depth."	2014-12-16 11:12:03 -08:00
JackyChen	e7bad92689	Fixed MFQE crash issue for highbit depth. Check the flags, no MFQE for highbit now. Will add highbit support latter. Change-Id: I548c27593e0f47ab7f4c92b45f14fb037dc86591	2014-12-16 10:07:38 -08:00
Jingning Han	581c8dbd33	Merge "Initialize best_tx_size with invalid value"	2014-12-16 10:01:03 -08:00
Yaowu Xu	b60ae45f36	Merge "Prevent decoder from using uninitialized entropy context."	2014-12-16 09:30:24 -08:00
Jingning Han	b47f9c5802	Merge "Use right shift to replace division in vp9_pick_inter_mode"	2014-12-16 09:26:51 -08:00
Paul Wilkins	b6c75c5a8d	Improve motion detection for low complexity regions. Where there is very subtle motion, especially when combined with low spatial complexity, the codec sometimes fails to quickly pick up the ambient motion field. Once it has been established though the field propagates well using Nearest and Near MV. This patch looks specifically at the case where the Nearest and Near have not been established as non zero vectors and in this case discounts the cost of searching for a new vector in the rd code. This will almost certainly have some implications in terms of encode speed but it should be possible to mitigate the impact in a subsequent using first pass stats and the local spatial complexity. Average results for test sets approximately neutral. Change-Id: I44a29e20f11f7ab10f8c93ffbdc50183d9801524	2014-12-16 17:22:54 +00:00
Debargha Mukherjee	4ebdb4a1b9	Merge "Fix for crash in highbitdepth rt mode"	2014-12-16 06:41:54 -08:00
Jim Bankoski	abc5a66770	Merge "Fix the comments."	2014-12-16 06:25:01 -08:00
Peter de Rivaz	e3d19bfc63	Fix for crash in highbitdepth rt mode Change 72141 introduced a new use of vp9_avg_4x4. This call needs to switch to using vp9_highbd_avg_4x4 when performing high bitdepth encodes. Change-Id: I6a8ba4b62f8a75d0a917b365a55245e2f0438ea1	2014-12-16 10:55:49 +00:00
Jingning Han	df3e3ab6ff	Fix intra mode update process in vp9_pick_inter_mode When multiple intra modes are tested, the previous mode info update process may overwrite the selected best intra mode and make the final selection use an inter mode. This commit fixes this issue by moving the mode_info reset outside the intra mode search loop. Change-Id: I15ed4288a6b3cb0832104a5e6d5d9a25cd1a5b2b	2014-12-15 17:52:09 -08:00
Johann	1d059fa23e	Merge "VP9 common for ARMv8 by using NEON intrinsics 06"	2014-12-15 14:49:33 -08:00
Johann	37ea1e1218	Merge "VP9 common for ARMv8 by using NEON intrinsics 05"	2014-12-15 14:48:53 -08:00
Jingning Han	5c93dca3d3	Merge "Simplify rate-distortion modeling function"	2014-12-15 14:37:19 -08:00
Jingning Han	c2c7596fc7	Initialize best_tx_size with invalid value If vp9_pick_inter_mode works properly, it should at least check one coding mode and hence get best_tx_size assigned a valid value. There is no need to initialize best_tx_size with a legitimate value before starting the mode search. Change-Id: Ic0496cd89672ea9c2c512a9bd1da952190af9cba	2014-12-15 12:58:34 -08:00
Jingning Han	83e2c62aba	Use right shift to replace division in vp9_pick_inter_mode Make the variable reduction_fac log2 based and explicitly use right shift when computing intra_cost_penalty. Change-Id: I208f1fb879a02debb3b3fc64f9fd06260dcf1c86	2014-12-15 12:48:07 -08:00
Frank Galligan	c4f7079ad4	Revert "Revert "Add support for setting byte alignment."" This reverts commit `91471d6aad`. Fixes the compile issues if post_proc is enabled. Change-Id: Ib40a15ce2c194f9b5adfa65a17ab01ddf60f5a59	2014-12-15 12:20:37 -08:00
James Yu	4f856cd7fa	VP9 common for ARMv8 by using NEON intrinsics 06 Add vp9_iht8x8_add_neon.c - vp9_iht8x8_64_add_neon The assembly did not previously implement tx_type 0 BUG=716 Change-Id: Icfc99dd24f3d59047f9184a7d0c761ba7e3de934 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-15 12:18:06 -08:00
James Yu	6b71013277	VP9 common for ARMv8 by using NEON intrinsics 05 Add vp9_iht4x4_add_neon.c - vp9_iht4x4_16_add_neon The assembly did not previously implement tx_type 0 BUG=715 Change-Id: I60034d1568de034edba45c5cdd13f3d87dbc73b6 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-15 12:16:19 -08:00
James Zern	8d558f2ca5	Merge "vp9/MACROBLOCKD: reorder struct members"	2014-12-15 11:54:51 -08:00
Jingning Han	eefe869291	Simplify rate-distortion modeling function Use left shift to replace one multiplication. The computation outcome remains identical. Change-Id: I1e1737af0a245de0d2a2bde10f0c171477199fc1	2014-12-15 11:51:16 -08:00
Paul Wilkins	91471d6aad	Revert "Add support for setting byte alignment." Fails to compile. Bad calls to vp9_alloc_frame_buffer and vp9_realloc_frame_buffer in postproc.c This reverts commit `399823b6f5`. Change-Id: I29f0e173f8e185d3a303cfdb17813e1eccb51e3a	2014-12-15 11:54:13 +00:00
James Zern	c58c579ec4	vp9/MACROBLOCKD: reorder struct members improves locality of reference Change-Id: I0639b98bf38879f918173b3a1b25dd93090e88b4	2014-12-12 18:01:24 -08:00
James Zern	089086bc25	Merge "Optimize bit_read_buffer."	2014-12-12 16:29:42 -08:00
Frank Galligan	9c2601eb68	Merge "Add support for setting byte alignment."	2014-12-12 15:47:11 -08:00
hkuang	3cecce916b	Optimize bit_read_buffer. Change-Id: Iee43c34909deec9787b29c1c33672213b9f049df	2014-12-12 14:38:12 -08:00
James Zern	89ee8923a8	Merge "Remove redundant loads on 1d16_v8 filter."	2014-12-12 14:32:52 -08:00
James Zern	f82d7fd854	Merge "Remove redundant loads on 1d8_v8 filter."	2014-12-12 14:32:26 -08:00
James Zern	4d40a046da	Merge "vp9: move encoder-only member from common"	2014-12-12 14:28:55 -08:00
James Zern	2bf4b4852f	Merge changes Id6421838,I37499329 * changes: vp9: make postproc members depend on CONFIG_VP9_POSTPROC vp9_postproc: remove redundant CONFIG_* checks	2014-12-12 14:27:56 -08:00
Marco	7f59cff53d	Merge "Allow for 4x4 prediction blocks for key frame, speed 6."	2014-12-12 14:27:31 -08:00
James Zern	5ccff43292	Merge "vp9_loopfilter_mmx: remove some unused tables"	2014-12-12 14:25:53 -08:00
Frank Galligan	399823b6f5	Add support for setting byte alignment. Add support for setting byte alignment on the Y, U, and V plane of the reference buffers. The byte alignment must be a power of 2, from 32 to 1024. A value of 0 sets legacy alignment. Change-Id: I7c1399622f7aa68e123646369216b32047dda73d	2014-12-12 13:34:36 -08:00
James Zern	6d1a63a02a	Merge "Remove unnecessary dqcoeff memset."	2014-12-12 12:16:32 -08:00
Frank Galligan	6a24dbd71f	Remove redundant loads on 1d16_v8 filter. This CL showed about a 3% gain in performance on some systems. Change-Id: Id27e7e0b8e69068aa364e67859436da852669250	2014-12-12 11:48:47 -08:00
Frank Galligan	44ee777905	Remove redundant loads on 1d8_v8 filter. This CL showed a modest gain in performance on some systems. Change-Id: Iad636a89a1a9804ab7a0dea302bf2c6a4d1653a4	2014-12-12 11:34:24 -08:00
James Zern	72ece1308b	vp9: move encoder-only member from common allow_comp_inter_inter VP9_COMMON -> VP9_COMP Change-Id: I6d9dc25d1cdd7e2ab62f5be69cd9fa883d21dbb6	2014-12-12 11:17:44 -08:00
James Zern	ef06de33fe	vp9: make postproc members depend on CONFIG_VP9_POSTPROC Change-Id: Id64218386968cee3132269e4a0572650f20fd980	2014-12-12 11:17:17 -08:00
James Zern	890f7bedf3	vp9_postproc: remove redundant CONFIG_* checks the entire module is wrapped in CONFIG_VP9_POSTPROC which is forcibly enabled with CONFIG_INTERNAL_STATS + a similar change in vp9_alloccommon.c Change-Id: I374993297a9fba5bef2f0b71f984eba42f0995a3	2014-12-12 11:17:16 -08:00
James Zern	d456ccbc9d	vp9_loopfilter_mmx: remove some unused tables Change-Id: I964d25cc91c8e4864d73b142d9c7a1b39cb6cfbb	2014-12-12 11:16:24 -08:00
Jim Bankoski	d916b0f22f	Merge "vp9_dx_iface.c uses CONFIG_VP9_POSTPROC but config.h not included"	2014-12-12 11:10:17 -08:00
Jingning Han	3e0793b80b	Merge "Fix PICK_MODE_CONTEXT index in non-RD coding mode"	2014-12-12 09:16:01 -08:00
Jim Bankoski	c67859f737	vp9_dx_iface.c uses CONFIG_VP9_POSTPROC but config.h not included Change-Id: Id316b3786214bf1028992968955da917e3f2d4a3	2014-12-12 08:42:36 -08:00
Jingning Han	e2c2a65695	Fix PICK_MODE_CONTEXT index in non-RD coding mode This commit fixes a bug in the PICK_MODE_CONTEXT index for horizontal partition case. The compression performance change is less than 0.01% level, since most blocks are selected to use square block size in RTC coding mode. Change-Id: I67effc18ae8795fccdd82a55f4efc609fa5cb3e1	2014-12-11 17:21:24 -08:00
JackyChen	3425d6c83e	Merge "Multiframe Quality Enhancement(MFQE) in VP9."	2014-12-11 16:24:08 -08:00
Marco	7e99cd2a9b	Allow for 4x4 prediction blocks for key frame, speed 6. For key frame under variance source partition: 4x4 prediction blocks may be selected when variance of 8x8 block is very high (threshold is set fairly high for now). Testing on some RTC clips shows this helps to reduce some ringing artifacts on key frame. Encoded key frame size increases about ~10%. Key frame PSNR increases about ~0.1-0.2dB. Change-Id: I56e203fac32ea6ef69897fb3ea269c59cb50d174	2014-12-11 15:36:16 -08:00
Jingning Han	811c74cdfa	Merge "Replace division with bit shift in choose_partitioning"	2014-12-11 13:30:03 -08:00
Debargha Mukherjee	dd33c656da	Merge "Corrected optimization of 8x8 DCT code"	2014-12-11 12:28:45 -08:00
hkuang	3c7a06c3cc	Remove unnecessary dqcoeff memset. dqcoeff is set to be 0 on initialization. And set back to 0 after being used everytime. Change-Id: I32b8e149bba40a8d707849f737a8e49a691f319c	2014-12-11 12:27:25 -08:00
Jingning Han	d9892e846f	Merge "Refactor choose_partitioning computing scheme"	2014-12-11 11:14:07 -08:00
Jingning Han	d5c396a902	Replace division with bit shift in choose_partitioning This commit explicitly uses the bit shift operation instead of division for computing block variance. Change-Id: Id19c0ff27dd1d1ae4aceee6657e1aad0d406bd74	2014-12-11 11:06:57 -08:00
Alexander Voronov	6c6a97814f	Prevent decoder from using uninitialized entropy context. If decoding starts with intra-only frame, there is a possibility of using uninitialized entropy context, what leads to undefined behavior. Change-Id: Icbb64b5b1bd1e5de2a4bfa2884e56bc0a20840af	2014-12-11 20:44:19 +03:00
Peter de Rivaz	5c22224e9e	Corrected optimization of 8x8 DCT code The 8x8 DCT uses a fast version whenever possible. There was a mistake in the checking code which meant sometimes the fast version was used when it was not safe to do so. Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7 (cherry picked from commit `fd05fb0c21`)	2014-12-11 09:42:57 -08:00
Jingning Han	377d2f027a	Refactor choose_partitioning computing scheme This commit refactors the choose_partitioning function. It removes redundant memset calls and makes the encoder to calculate variance value per block only when it is needed. It reduces the average runtime cost of choose_partitioning by 60%. Overall it reduces speed -6 runtime by 2-5%. Change-Id: I951922c50d901d0fff77a3bafc45992179bacef9	2014-12-11 09:33:40 -08:00
JackyChen	7ac3e3c1d6	Multiframe Quality Enhancement(MFQE) in VP9. It is the first version of MFQE in VP9. There are a few TODOs included in this version. Usage: Add flag --enable-vp9-postproc to config the project. In decoder, use flag --mfqe in the command line to enable MFQE in postproc. Note: Need to have key frame with low quality to see the effect of this new patch. In my experiment, I fixed the qindex to 200 in key frame. Change-Id: I021f9ce4616ed3574c81e48d968662994b56a396	2014-12-11 09:19:39 -08:00
James Yu	3f7c12dab9	VP9 common for ARMv8 by using NEON intrinsics 18 Add vp9_idct32x32_add_neon.c - vp9_idct32x32_1024_add_neon Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:20:04 -08:00
James Yu	3cfed4bf76	VP9 common for ARMv8 by using NEON intrinsics 14 Add vp9_idct16x16_add_neon.c - vp9_idct16x16_256_add_neon_pass1 - vp9_idct16x16_256_add_neon_pass2 - vp9_idct16x16_10_add_neon_pass1 - vp9_idct16x16_10_add_neon_pass2 Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:19:54 -08:00
James Yu	ce76aeb00d	VP9 common for ARMv8 by using NEON intrinsics 13 Add vp9_idct8x8_add_neon.c - vp9_idct8x8_64_add_neon - vp9_idct8x8_10_add_neon Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:56:54 -08:00
James Yu	8c25f4af6a	VP9 common for ARMv8 by using NEON intrinsics 12 Add vp9_idct4x4_add_neon.c - vp9_idct4x4_16_add_neon Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:49:59 -08:00
James Yu	420f58f2d2	VP9 common for ARMv8 by using NEON intrinsics 11 Add vp9_idct16x16_1_add_neon.c - vp9_idct16x16_1_add_neon Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:06:58 -08:00
James Yu	030ca4d0e5	VP9 common for ARMv8 by using NEON intrinsics 10 Add vp9_idct32x32_1_add_neon.c - vp9_idct32x32_1_add_neon Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:04:29 -08:00
James Yu	2772b45ac0	VP9 common for ARMv8 by using NEON intrinsics 09 Add vp9_idct8x8_1_add_neon.c - vp9_idct8x8_1_add_neon Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:52:33 -08:00
James Yu	9114f0afdb	VP9 common for ARMv8 by using NEON intrinsics 08 Add vp9_idct4x4_1_add_neon.c - vp9_idct4x4_1_add_neon Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:49:28 -08:00
Johann	2d8f581330	Merge "VP9 common for ARMv8 by using NEON intrinsics 07"	2014-12-10 11:40:46 -08:00
Johann	913d0adbaf	Merge "VP9 common for ARMv8 by using NEON intrinsics 04"	2014-12-10 11:40:29 -08:00
Paul Wilkins	65cfb808d0	Merge "Substantial restructuring of AQ mode 2."	2014-12-10 10:44:27 -08:00
Jingning Han	ad19724f1a	Merge "Use use_prev_frame_mvs flag for ref mv search branch"	2014-12-10 09:25:12 -08:00
Jingning Han	6fc289b9c0	Merge "Refactor update_state_rt"	2014-12-10 09:25:05 -08:00
Jingning Han	8bd88a3c83	Merge "Make RTC coding flow support sub8x8 in key frame coding"	2014-12-10 09:24:56 -08:00
Jingning Han	4cda7a1a9a	Merge "Cosmetic naming change"	2014-12-10 09:05:34 -08:00
Jingning Han	fb3cc0ed57	Merge "Take out redundant setting of mode_info from set_block_size"	2014-12-10 09:05:26 -08:00
Jingning Han	161f636809	Merge "Remove unused rd cost calculation from nonrd_use_partition"	2014-12-10 09:05:18 -08:00
James Yu	01fc6f51e0	VP9 common for ARMv8 by using NEON intrinsics 07 Add vp9_convolve8_neon.c - vp9_convolve8_horiz_neon - vp9_convolve8_vert_neon Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	893534a996	VP9 common for ARMv8 by using NEON intrinsics 04 Add vp9_convolve8_avg_neon.c - vp9_convolve8_avg_horiz_neon - vp9_convolve8_avg_vert_neon Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	d12757f5c6	VP9 common for ARMv8 by using NEON intrinsics 03 Add vp9_copy_neon.c - vp9_convolve_copy_neon Change-Id: I291fc5423d06240876411bbceab03eae5ef585be Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:02:46 -08:00
Scott LaVarnway	617382a2e3	VP9 common for ARMv8 by using NEON intrinsics 02 Add vp9_avg_neon.c - vp9_convolve_avg_neon Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 19:00:21 -08:00
Jingning Han	0cac834b5a	Use use_prev_frame_mvs flag for ref mv search branch Replace error_resilient flag with use_prev_frame_mvs in vp9_pick_inter_mode reference motion vector search selection. This effectively turns off the simplified ref mv search in the settings of frame resizing, even if error-resilient mode is off. Change-Id: I7fed814ee7bc0cb419a03b846e0fc2de46ba7686	2014-12-09 18:18:40 -08:00
Jingning Han	e728678c50	Refactor update_state_rt Update the frame motion vector only if previous frame motion vector is needed for next frame reference motion vector. Change-Id: Ica50f9d7b46ad4f815bba0d9e30f5546df29546f	2014-12-09 15:35:49 -08:00
hkuang	4eee74d6ed	Fix clang ioc warning due to NULL src_mi pointer. The warning only happens in VP9 encoder's first pass due to src_mi is not set up yet. But it will not fail the encoder as left_mi and above_mi are not used in the first_pass and they will be set up again in the second pass. Change-Id: I12dffcd5fb1002b2b2dabb083c8726650e4b5f08	2014-12-09 14:32:48 -08:00
Johann	5810f1b4cd	Merge "VP9 common for ARMv8 by using NEON intrinsics 01"	2014-12-09 13:41:49 -08:00
James Yu	5b098b1825	VP9 common for ARMv8 by using NEON intrinsics 01 Add vp9_loopfilter_neon.c - vp9_lpf_horizontal_4_neon - vp9_lpf_vertical_4_neon - vp9_lpf_horizontal_8_neon - vp9_lpf_vertical_8_neon Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 12:26:56 -08:00
Jingning Han	225cdef665	Make RTC coding flow support sub8x8 in key frame coding This commit enables the use of sub8x8 blocks in RTC key frame encoding. It requires the block size to be preset and will decide the coding mode and encode the bit-stream. Change-Id: I35aaf8ee2d4d6085432410c7963f339f85a2c19b	2014-12-09 11:34:58 -08:00
Jingning Han	4bacaab46d	Cosmetic naming change Rename set_modeinfo_offsets as set_mode_info_offsets, to be more consistent with naming convention. Change-Id: I68ca1f36c4a78127d9439a50c1506a2afd07927d	2014-12-09 10:32:04 -08:00
Jingning Han	f051a7beab	Take out redundant setting of mode_info from set_block_size The later encoding process will take the top-left block's mode_info for pre-determined block size. Change-Id: I76a90f9ce7f3b2dbc2975b52442114e461c465b5	2014-12-09 10:27:18 -08:00
hkuang	3dfdfd5c86	Merge "Clean up the logic of handling corrupted frame."	2014-12-09 10:23:18 -08:00
Paul Wilkins	e68c8dcfd2	Substantial restructuring of AQ mode 2. The restructure moves the decision into the rd pick modes loop and makes a decision based at the 16x16 block level instead of only the 64x64 level. This gives finer granularity and better visual results on the clips I have tested. Metrics results are worse than the old AQ2 especially for PSNR and this mode now falls between AQ0 and AQ1 in terms of visual impact and metrics results. Further tuning of this to follow. It should be noted that if there are multiple iterations of the recode loop the segment for a MB could change in each loop if the previous loop causes a change in the complexity / variance bin of the block. Also where a block gets a delta Q this will alter the rd multiplier for this block in subsequent recode iterations and frames where the segmentation is applied. Change-Id: I20256c125daa14734c16f7cc9aefab656ab808f7	2014-12-09 15:10:52 +00:00
Jingning Han	1395ded2a7	Remove unused rd cost calculation from nonrd_use_partition The per block rd cost calculation is not needed when partition size is preset. Change-Id: Ie5575248bbffb584e908aa13097f697ace6ec747	2014-12-08 18:45:19 -08:00
Yunqing Wang	cddbdeabd0	Merge "SSSE3 Optimization for Atom processors using new instruction selection and ordering"	2014-12-08 13:34:54 -08:00
James Zern	c38d0490b3	Merge "Changes to assembler for NASM on mac."	2014-12-08 12:55:06 -08:00
hkuang	81e5cb86d3	Fix the comments. Change-Id: I9789476865a1b24dad54115d8f7edb4fed780b90	2014-12-08 12:44:09 -08:00
hkuang	d05cf10fe7	Add error handling for frame parallel decode and unit test for that. Change-Id: I6e309e11f1641618d2424b7a2c0fe744b8974dec	2014-12-08 12:30:19 -08:00
levytamar82	8f9d94ec17	SSSE3 Optimization for Atom processors using new instruction selection and ordering The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors. By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved. In the original code, the PSHUBF uses every byte and is consecutively copied. This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result. For example: filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7 REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15 PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors. There was no observed performance impact on Core processors (expected). Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0	2014-12-08 13:11:01 -07:00
hkuang	f925e5ce0f	Merge "Improve the performance by caching the left_mi and right_mi in macroblockd."	2014-12-08 10:24:17 -08:00
Paul Wilkins	127f65531b	Merge "Use average mb energy from first pass in AQ2 test."	2014-12-08 09:01:39 -08:00
Frank Galligan	0f8e8330eb	Merge "Fix potential integer overflow."	2014-12-07 21:37:39 -08:00
James Zern	da464c483f	Merge "vp9 asserts: fix compile warning"	2014-12-05 21:09:42 -08:00
James Zern	3db785facc	Merge "vp9: fix frame-parallel encoding"	2014-12-05 19:00:48 -08:00
Deb Mukherjee	0d367474d0	Merge "Some internal-stats, vp9-highbitdepth bug fixes"	2014-12-05 17:49:52 -08:00
James Zern	6db81fd629	vp9: fix frame-parallel encoding the flag in the header wasn't being set based on the encoder configuration in non-intra only mode broken since: `fbc2fbf` Adding oxcf temp variable. Change-Id: Ib4cff9901889824bc4e68d7f0f6deb1e41df2f53	2014-12-05 17:44:46 -08:00
Jingning Han	bd6bfb93b0	Merge "Remove redundant rdcost reset"	2014-12-05 17:35:07 -08:00
Jingning Han	296afb9440	Merge "Fix a motion search skip condition in vp9_pick_inter_mode"	2014-12-05 17:35:04 -08:00
Jingning Han	3d8d1e374e	Merge "Remove redundant MB_MODE_INFO reset from vp9_pick_mode_inter"	2014-12-05 16:59:50 -08:00
hkuang	382f86f945	Improve the performance by caching the left_mi and right_mi in macroblockd. This improve the deocde performance by ~2% on Nexus 7 2013. Change-Id: Ie9c4ba0371a149eb7fddc687a6a291c17298d6c3	2014-12-05 16:25:42 -08:00
James Zern	616b3a810f	vp9 asserts: fix compile warning string literal to int within an assert Change-Id: I76a173f96b9add5bf27c3f5ad5d72c6f30e51629	2014-12-05 16:20:42 -08:00
Jingning Han	17bedc54f5	Remove redundant rdcost reset The initial reset of this_rdc in vp9_pick_inter_mode is not needed, since it will be re-assign when used. Change-Id: Ic0e12d741cbab292fc214c1eabb48b129af7839b	2014-12-05 16:06:17 -08:00
Jingning Han	eadffb2d6e	Fix a motion search skip condition in vp9_pick_inter_mode Compare the current best mode rate-distortion cost with the skip threshold to decide if performing motion search. Change-Id: Ia071824f8dd3b7db485f424692a485a2da6a1a9f	2014-12-05 15:58:36 -08:00
Jingning Han	732d57c2b5	Remove redundant MB_MODE_INFO reset from vp9_pick_mode_inter Change-Id: I0222f7abc61202f4a83b117bbfb042ada6304562	2014-12-05 15:51:11 -08:00
hkuang	eaa6deee5b	Merge "Merge set_prev_mi function into encoder function."	2014-12-05 15:12:50 -08:00
Deb Mukherjee	37448d3e1f	Some internal-stats, vp9-highbitdepth bug fixes Change-Id: I0363d98f6f6558a43276aec48f27dca37c93f5ad	2014-12-05 13:40:50 -08:00
Jingning Han	6ae829088f	Merge "Remove redundant vp9_zero in choose_partitioning"	2014-12-05 11:47:58 -08:00
Jingning Han	69a9dc5cd3	Merge "Enable conditional skip path in rd_pick_intra_sby_mode"	2014-12-05 11:25:30 -08:00
Jingning Han	62c7356098	Merge "Use hybrid RD and non-RD coding flow for key frame coding"	2014-12-05 11:25:19 -08:00
Jingning Han	9d88b30854	Remove redundant vp9_zero in choose_partitioning It makes the overall speed -6 about 2% faster with no compression performance change. Change-Id: I680a967b421caa2c5a5cdb821311c4726a2df45a	2014-12-05 10:39:39 -08:00
Jingning Han	74ded4863e	Enable conditional skip path in rd_pick_intra_sby_mode These speed-up features for key frame coding are only turned on in the settings of hybrid non-RD and RD mode decision. It provides about 20% speed-up to the hybrid key frame coding at the expense of certain compression performance loss. For vidyo1, the key frame coding statistics are changed 9838F, 35.020 dB, 61677 us -> 9920F, 34.834 dB, 47556 us Overall rtc set compression performance is down by -0.257%. Change-Id: I0025447fda26bb7855e982955642b5f55d71b51f	2014-12-05 09:36:09 -08:00
Jingning Han	07711e9b27	Use hybrid RD and non-RD coding flow for key frame coding When block size is below 16x16, the encoder swap from non-RD to RD mode for key frame coding. This largely brough back the key frame compression performance. For vidyo1 at 1000 kbps, the key frame coding statistics are changed 9978F, 34.183 dB, 36807 us -> 9838F, 35.020 dB, 61677 us As compared to the full RD case 7187F, 34.930 dB, 214470 us The overall rtc set coding performance (single key frame setting) is improved by 1.5%. Change-Id: I78a4ecf025d7b24ec911e85be94e01da05e77878	2014-12-05 09:35:27 -08:00
Yunqing Wang	a3a4a34c60	Merge "vp9_ethread: the tile-based multi-threaded encoder"	2014-12-05 08:23:49 -08:00
Frank Galligan	4c4d7261e4	Fix potential integer overflow. ioc found a potential integer overflow in the rate control. This is related to https://code.google.com/p/webm/issues/detail?id=821 Change-Id: Ib6c4acd6e964972f932fce7490592eb134f2b7ea	2014-12-05 08:02:12 -08:00
Paul Wilkins	bb6e47c1c9	Merge "Increase strength of AQ1."	2014-12-05 04:11:43 -08:00
Debargha Mukherjee	15cf55b3ca	Merge "Use the RTC optimizations when in high bitdepth mode."	2014-12-04 19:22:27 -08:00
James Zern	b43c27ab6e	Merge "vp9_reader: reorder struct members"	2014-12-04 16:08:08 -08:00
Debargha Mukherjee	4bfde1071e	Merge "Corrected the renaming of CONFIG_VP9_HIGH ro CONFIG_VP9_HIGHBITDEPTH."	2014-12-04 15:52:35 -08:00
Peter de Rivaz	a306bd8274	Use the RTC optimizations when in high bitdepth mode. Change 72193 made the encoder behave differently when configured with and without high bitdepth. This change means the same algorithm is used for both. Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5	2014-12-04 15:48:42 -08:00
hkuang	dde819599b	Clean up the logic of handling corrupted frame. No more checking of corrupted reference frame as we skip decoding any non-intra frame in case of frame corrupted. Change-Id: I77d41bbb02fc5f61972740e2d411441eb6a17073	2014-12-04 15:07:59 -08:00
hkuang	62de07c8c6	Merge set_prev_mi function into encoder function. Change-Id: Ifcf2efbb232ea4cabcdebbe77e0820d121e4a6da	2014-12-04 14:44:23 -08:00
Yunqing Wang	eba9c762a1	vp9_ethread: the tile-based multi-threaded encoder Currently, VP9 supports column-tile encoding, which allows a frame to be encoded in multiple column tiles independently. The number of column tiles are set by encoder option "--tile-columns". This provides a way to encode a frame in parallel. Based on previous set of patches, this patch implemented the tile- based multi-threaded encoder. Each thread processes one or more tiles. Usage: For HD clips: --tile-columns=2 --threads=1/2/3/4 While using 4 threads, tests showed that the encoder achieved 2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at realtime speed 5. Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4	2014-12-04 11:21:34 -08:00
Deb Mukherjee	4f860dba78	Merge "Fixes a missing highbitdepth convolve call bug"	2014-12-04 11:19:59 -08:00
Adrian Grange	9065da983f	Merge "Free motion vector array before re-allocating"	2014-12-04 07:08:37 -08:00
Peter de Rivaz	f610f88be4	Corrected the renaming of CONFIG_VP9_HIGH ro CONFIG_VP9_HIGHBITDEPTH. Change 71789 renamed CONFIG_VP9_HIGH to CONFIG_VP9_HIGHBITDEPTH. However, one use of CONFIG_VP9_HIGH was missed. Change-Id: I0ebb9c71380c6d810a25708d15471abf9533e695	2014-12-04 11:01:46 +00:00
Tom Finegan	7339681ee9	Merge "sse2 visual studio build fix"	2014-12-03 18:05:03 -08:00
Deb Mukherjee	70d9dbd818	Fixes a missing highbitdepth convolve call bug Bug was introduced in https://gerrit.chromium.org/gerrit/#/c/72122/ Change-Id: Idb500ea619a30e7bc50e22fb8ee03be5282f41db	2014-12-03 17:48:50 -08:00
Adrian Grange	b56451f488	Merge "Use memset for initialization to 0"	2014-12-03 16:50:39 -08:00
Deb Mukherjee	6615706af2	sse2 visual studio build fix Change-Id: Id8c8c3be882bcd92afea3ccec6ebdf3f208d28ef	2014-12-03 16:35:26 -08:00
Adrian Grange	979ee6e4c9	Free motion vector array before re-allocating Change-Id: I0c39136d67e1e83020d61f86b062a04182ec9b00	2014-12-03 16:07:32 -08:00
Marco	fb20a07c36	Merge "Increase delta-qp for aq=3 mode, after key frame."	2014-12-03 16:03:06 -08:00
Jingning Han	3665f194fa	Merge "Fix indent in source_var_based_partition_search_method"	2014-12-03 15:43:40 -08:00
Adrian Grange	73caef0500	Use memset for initialization to 0 Change-Id: I714ca22b5d51016bf8b035cf457616c707257641	2014-12-03 15:22:02 -08:00
James Zern	d5937cd268	Merge "vp9: sync threads after a longjmp"	2014-12-03 14:30:55 -08:00
Marco	a047e7cdf8	Increase delta-qp for aq=3 mode, after key frame. For a few refresh periods after key frame, use large qp-delta to increase quality ramp-up. Change-Id: Ib5a150fb2dfa6bafd0d4e6b5d28dfd0724b61319	2014-12-03 13:04:45 -08:00
Jingning Han	17176cd452	Fix indent in source_var_based_partition_search_method Change-Id: I6e5e0571d6967b9b992966336715e35bb97f187e	2014-12-03 12:37:36 -08:00
Jingning Han	8f3db5f22e	Merge "Remove unused ONE_LOOP entry from speed feature"	2014-12-03 11:34:42 -08:00
Jingning Han	228ec17ff2	Merge "Rework coeff probability model update for rtc coding"	2014-12-03 11:34:35 -08:00
Marco	8fd3f9a2fb	Enable non-rd mode coding on key frame, for speed 6. For key frame at speed 6: enable the non-rd mode selection in speed setting and use the (non-rd) variance_based partition. Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames), mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16. Loss in key frame quality (~0.6-0.7dB) compared to rd coding, but speeds up key frame encoding by at least 6x. Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6. Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405	2014-12-03 09:18:08 -08:00
Jingning Han	a8d8c0f633	Remove unused ONE_LOOP entry from speed feature Change-Id: I56ead0ebc2491144c4e79e5859b05e126176702c	2014-12-03 09:17:08 -08:00
Jingning Han	8fe50191c6	Rework coeff probability model update for rtc coding This commit reworks the ONE_LOOP_REDUCED coefficient probability model update process. It allows model update for every coefficient across the spectrum at a coarser resolution, instead of performing precise update only for certain subset of probability models. The overall runtime remains nearly same (<1% change) for speed -6. The compression performance is improved by 7.5% in PSNR for speed -5 and 4.57% for speed -6, respectively. Change-Id: Ifb17136382ee7e39a9f34ff4a4f09a753125c8d1	2014-12-03 09:15:25 -08:00
James Zern	6f7ab01451	vp9: sync threads after a longjmp Synchronize all threads immediately as a subsequent decode call may cause a resize invalidating some allocations. fixes one aspect of crbug.com/437655 Change-Id: Ie993b62c2756478543206ddbe43ec6268d90a470	2014-12-02 16:51:27 -08:00
Debargha Mukherjee	99874f55fb	Merge "Reinsert macro to fix issue 884."	2014-12-02 15:32:24 -08:00
Deb Mukherjee	1fbe0c7615	Merge "Fix a warning related to VPX_EFLAG_FORCE_KF check"	2014-12-02 14:03:55 -08:00
Peter de Rivaz	2c886953d1	Reinsert macro to fix issue 884. Change 72056 unfolded some macro definitions, but lost some alternative behaviour required for high bitdepth encodes. This causes the encoder to crash, see issue 884. Change-Id: I8ce4d73c9fe0a3c10ccb86fba210fabc8b2f0ccc	2014-12-02 13:45:26 -08:00
Deb Mukherjee	02941b0df2	Fix a warning related to VPX_EFLAG_FORCE_KF check Fixes a warning in chrome build. Change-Id: I8fa0fd3e7ba1aecf89e5f79ce94cd64ed6a9567c	2014-12-02 11:35:52 -08:00
Peter de Rivaz	7e40a55ef9	Added high bitdepth sse2 transform functions Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit `005d80cd05`) (cherry picked from commit `08d2f54800`) (cherry picked from commit `4230c2306c`)	2014-12-02 11:16:24 -08:00
Paul Wilkins	00e3626e13	Use average mb energy from first pass in AQ2 test. AQ2 modified to use mb_av_energy in defining variance thresholds used alongside complexity when defining the segment to be used for an SB64. Slight improvements in metrics (ssim and PSNR). Change-Id: Idb9cb73f7d9c4f7118cd7e84ac77b0f25cacbf81	2014-12-02 16:07:30 +00:00
Marco Paniconi	83fd18977f	Cyclic refresh: factor segment delta-q into rate control. Incorporate segment delta-q into estimated bits. This generally improves the rate control under cyclic refresh (aq=3) mode. Change-Id: I1dc60fb230e7d08357fae18909d8ed27bf58e037	2014-12-01 16:56:43 -08:00
Jingning Han	f59cb45e90	Merge "Remove repeated search_type_check_frequency assign"	2014-12-01 14:02:10 -08:00
Yunqing Wang	7af927e324	Merge "vp9_ethread: calculate and save the tok starting address for tiles"	2014-12-01 12:49:03 -08:00
Paul Wilkins	0d3d6e0e31	Increase strength of AQ1. This patch greatly increase the strength of AQ1. Visual tests show strong gains on many clips but their is a big hit on psnr. SSIM is more mixed with some winners and losers. Change-Id: Idaa5d3b41d8576096bfa000b62bc531c3d8bf6a1	2014-11-27 10:53:37 +00:00
Jingning Han	a6df0cbcca	Remove repeated search_type_check_frequency assign This parameter is initialized as 50. No need to re-assign the same value in speed -6. Change-Id: I8735a5593412df2fdcee53ae45c8ebd1c3d792e7	2014-11-25 18:36:41 -08:00
Yunqing Wang	0993bef7e9	vp9_ethread: calculate and save the tok starting address for tiles Each tile's tok starting address is calculated before the encoding process. These addresses are stored so that the same calculation won't be done again in packing bit stream. Change-Id: I0a3be0301f002260c19a850303f2f73ebc47aa50	2014-11-25 17:19:35 -08:00
Yaowu Xu	e4234b3f8b	Separate rate_correction_factor for boosted GFs When the golden frame is boosted, the rate correction factor is not correlated well with other inter frames even in CBR mode. This commit changes to use GF specific rate_correction_factor when gf_cbr_boost is greater than 20%. Change-Id: I6312c1564387bcacc11f4c5e8a9cfdc781b5c3ab	2014-11-25 14:32:07 -08:00
Jingning Han	a04ed98482	Cosmetic change in vp9_pick_inter_mode Change-Id: Ic072585ebffdb36982ed7b8b9f875ca6c1c656c4	2014-11-25 09:42:57 -08:00
Jingning Han	92a7cfc8bf	Adaptively adjust mode test kick-off thresholds in RTC coding This commit allows the encoder to increase the mode test kick-off thresholds if the previous best mode renders all zero quantized coefficients, thereby saving motion search runs when possible. The compression performance of speed -5 and -6 is down by -0.446% and 0.591%, respectively. The runtime of speed -6 is improved by 10% for many test clips. vidyo1, 1000 kbps 16578 b/f, 40.316 dB, 7873 ms -> 16575 b/f, 40.262 dB, 7126 ms nik720p, 1000 kbps 33311 b/f, 38.651 dB, 7263 ms -> 33304 b/f, 38.629 dB, 6865 ms dark720p, 1000 kbps 33331 b/f, 39.718 dB, 13596 ms -> 33324 b/f, 39.651 dB, 12000 ms mmoving, 1000 kbps 33263 b/f, 40.983 dB, 7566 ms -> 33259 b/f, 40.978 dB, 7531 ms Change-Id: I7591617ff113e91125ec32c9b853e257fbc41d90	2014-11-25 09:42:08 -08:00
Jingning Han	30104207fd	Merge "Rework forward txfm/quantization skip system in RTC coding mode"	2014-11-25 09:33:57 -08:00
Jingning Han	6912c44135	Merge "Remove redundant intra mode penalty from vp9_pick_inter_mode"	2014-11-24 22:13:44 -08:00
James Zern	e1f55e0441	vp9_reader: reorder struct members improves locality of reference Change-Id: Ia4d55bb8c98e479528d88303fa35e8c74fbf939d	2014-11-24 22:10:39 -08:00
Yunqing Wang	edbd61e136	vp9_ethread: modify VP9_COMP structure This patch modified struct VP9_COMP. Created a struct ThreadData to include data that need to be copied for each thread. In multiple thread case, one thread processes one tile. all threads share one copy of VP9_COMP, (refer to VP9_COMP cpi in the code) but each thread has its own copy of ThreadData, (refer to ThreadData td in the code). Therefore, within the scope of encode_tiles(), both cpi and td need to be passed as function parameters. In single thread case, the FRAME_COUNTS pointer in ThreadData points to "counts" in VP9_COMMON. Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e	2014-11-24 17:57:38 -08:00
Alex Converse	60ef6c0735	Merge "Fix a tautological assert."	2014-11-24 16:36:53 -08:00
Alex Converse	0496d11486	Fix a tautological assert. Change-Id: I90ad08823e1d038384536fa9f458caadc2c87f38	2014-11-24 15:01:01 -08:00
Jingning Han	25be81e2dd	Remove redundant intra mode penalty from vp9_pick_inter_mode The intra mode penalty is covered by intra_cost_penalty. This commit removes the other intra cost threshold, provided that the constant 50 is negligible in normal rate-distortion cost. Change-Id: I9b8b7483c43b9a41741622e7057def1f7d51bb72	2014-11-24 14:55:59 -08:00
Jingning Han	e6fb9c0b0b	Merge "Key frame non-RD mode decision process"	2014-11-24 13:21:56 -08:00
Debargha Mukherjee	e9d9f1adab	Merge "Refactored idct routines and headers"	2014-11-24 12:47:03 -08:00
John Stark	71379b87df	Changes to assembler for NASM on mac. fixes non-Apple nasm part of issue #755 Change-Id: I11955d270c4ee55e3c00e99f568de01b95e7ea9a	2014-11-24 12:00:50 -08:00
Peter de Rivaz	3a8c43a479	Refactored idct routines and headers This change is made in preparation for a subsequent patch which adds acceleration for the highbitdepth transform functions. The highbitdepth transform functions attempt to use 16/32bit sse instructions where possible, but fallback to using the C implementations if potential overflow is detected. For this reason the dct routines are made global so they can be called from the acceleration functions in the subsequent patch. Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665 (cherry picked from commit `454342d4e7`)	2014-11-24 09:57:40 -08:00
Jingning Han	2fbdfd2c66	Key frame non-RD mode decision process This commit makes a non-RD coding mode decision process for key frame coding. It can be optionally turned on in speed -6 and above. Change-Id: I0847258b392877a0210b4768bef88ebc9ad009b5	2014-11-24 09:04:28 -08:00
Marco	681d5e9024	Merge "Only allow for cyclic refresh (aq=3 mode) for base layer."	2014-11-24 07:46:36 -08:00
Paul Wilkins	2232c3e34b	Merge "Fix some minor nits."	2014-11-21 17:39:43 -08:00
Debargha Mukherjee	02355a4abf	Merge "Added highbitdepth sse2 acceleration for quantize"	2014-11-21 16:08:47 -08:00
Paul Wilkins	d28e9ed452	Merge changes Ie077edd0,Id31a74fc * changes: Remove rate component adjustment for AQ1 Switch AQ1 segment basis from q ratio to rate ratio.	2014-11-21 15:38:32 -08:00
Paul Wilkins	771259fe10	Merge "Add adaptive midpoint for AQ1."	2014-11-21 15:26:18 -08:00
Paul Wilkins	6dbf83d082	Merge "Add variance restriction to AQ2."	2014-11-21 15:25:43 -08:00
Marco	53c3f2ca4d	Only allow for cyclic refresh (aq=3 mode) for base layer. Condition existed for temporal case, added it for spatial as well. Issue: https://code.google.com/p/webm/issues/detail?id=878. Change-Id: I38339207f9a94924f5568a081eabe64f867a686d	2014-11-21 14:47:32 -08:00
Paul Wilkins	ea494c0e76	Fix some minor nits. Change-Id: Ib8810d431fa20a2c78e0caaa28eb2c99903e60fb	2014-11-21 14:13:59 -08:00
Paul Wilkins	a867bb538b	Merge "Further AQ1 clean up."	2014-11-21 12:58:03 -08:00
Jingning Han	7428cebe4f	Rework forward txfm/quantization skip system in RTC coding mode This commit allows more aggressive decision to skip forward transform and quantization for luma component in RTC coding mode. The chroma components remains going through the normal coding routine, since they are not included in the non-RD mode search process. It reduces the runtime cost by 2% - 10%. In speed -6, vidyo1 1000 kbps 16576 b/f, 40.281 dB, 8402 ms -> 16576 b/f, 40.323 dB, 7764 ms nik720p 1000 kbps 33337 b/f, 38.622 dB, 7473 ms -> 33299 b/f, 38.660 dB, 7314 ms dark720p 1000 kbps 33330 b/f, 39.785 dB, 13505 ms -> 33325 b/f, 39.714 dB, 13105 ms The compression performance of speed -6 is improved by 0.44% in PSNR and 1.31% in SSIM. Change-Id: Iae9e3738de6255babea734e5897f29118bebc6d7	2014-11-21 12:46:40 -08:00
Paul Wilkins	b87c51ce55	Merge "Initial AQ1 restructuring."	2014-11-21 12:10:03 -08:00
Paul Wilkins	f5209d7e01	Remove rate component adjustment for AQ1 In AQ1 a rate adjustment was applied for blocks coded with a deltaq. This tends to skew the partition selection and cause rate overshoot. For example, consider a 64x64 super block where some but not all sub blocks are in a low q segment and some are in a high q segment. The choice of Q when considering large partition and transform sizes is defined by the lowest sub block segment id (currently this implies the lowest Q). If some parts of the larger partition are very hard this will cause a high rate component. The correct behavior here is for the rd code to discard the large partition choice and break down to sub blocks where some have low and some have high Q. However the rate correction factor above mask the high cost of coding at a larger partition size. Change-Id: Ie077edd0b1b43c094898f481df772ea280b35960	2014-11-21 08:51:58 -08:00
Paul Wilkins	1663eff7f8	Switch AQ1 segment basis from q ratio to rate ratio. In defining the Q deltas for segments in AQ1 use a rate ratio rather than a q ratio. Change-Id: Id31a74fcf2b7e55437e42a51c21b3cbcb57028d4	2014-11-21 08:50:57 -08:00
Paul Wilkins	fc47c5d653	Add adaptive midpoint for AQ1. Make the midpoint variance used in AQ mode 1 segmentation depend on the overall complexity of the frame in two pass. Change-Id: I452814ec57f7a32352e41bb250e78066abe952dd	2014-11-20 18:37:34 -08:00
Alex Converse	bc1b3d8412	Allow DC/H/V/TM on screen content. 6.3% better compression less than 1% compression time increase Change-Id: Ie83c059436e54c09de9e7c87e06e0a6d40dc38fe	2014-11-20 18:04:57 -08:00
Alex Converse	722e9d611b	Drop special inter mode selection for screen content. Better mode selection was implemented for all content. Change-Id: I479778ed21d3968892f4dce396c83733583f4f23	2014-11-20 18:04:57 -08:00
Yunqing Wang	72522dbc86	Merge "vp9_ethread: move filter_cache out of RD_OPT struct"	2014-11-20 16:51:31 -08:00
Paul Wilkins	d031237999	Add variance restriction to AQ2. Add an additional restriction to bit/complexity based segmentation based on spatial variance. Only lower Q when both the number of bits spent in the initial encoding pass and the spatial complexity are below a threshold. This will prevent the low Q segments being used just because there is a surfeit of bits. Small metrics gains especially opsnr. derf ~0.2% std-hd ~0.3% Change-Id: I6a8496d466d673f9b0e2b2ca6304ea7b6d8e1cce	2014-11-20 16:23:35 -08:00
Paul Wilkins	3d1e8c9a85	Further AQ1 clean up. Further patch to restructure AQ mode 1. Change-Id: I566452a033d047a49a40441a7be24690ea69412d	2014-11-20 16:00:51 -08:00
Paul Wilkins	6a760d483d	Initial AQ1 restructuring. This is the first of a series of patches to restructure and improve AQ mode 1 (variance based AQ). Change-Id: Idcf693131a3ea2459dcfd957a54a65b971fa4a2a	2014-11-20 15:50:15 -08:00
Paul Wilkins	b74eeb8675	Merge "Fix bug in calculating number of mbs with scaling."	2014-11-20 15:45:41 -08:00
Yunqing Wang	54ba65a63e	Merge "vp9_ethread: move max/min partition size to mb struct"	2014-11-20 14:00:37 -08:00
Yunqing Wang	379334c2d8	vp9_ethread: move filter_cache out of RD_OPT struct Similar to mask_filter, the filter_cache in RD_OPT struct can be moved out, and declared as a local variable since it is only used in pick_inter_mode functions. Change-Id: I412b99cca82bade07ac912064ec03dd1de6b2c17	2014-11-20 13:44:16 -08:00
Yunqing Wang	0b71fdbf80	Merge "vp9_ethread: change mask_filter to a local variable"	2014-11-20 13:02:55 -08:00
Yunqing Wang	bdaa3eaf43	Merge "Revert "vp9_ethread: include a pointer to mb in VP9_COMP""	2014-11-20 12:27:34 -08:00

... 7 8 9 10 11 ...

7684 Commits