generic-library/vpx

Author	SHA1	Message	Date
Jingning Han	216b171d63	Merge "Integral projection based motion estimation"	2015-02-19 15:08:11 -08:00
Jingning Han	ed2dc59c1b	Integral projection based motion estimation This commit introduces a new block match motion estimation using integral projection measurement. The 2-D block and the nearby region is projected onto the horizontal and vertical 1-D vectors, respectively. It then runs vector match, instead of block match, over the two separate 1-D vectors to locate the motion compensated reference block. This process is run per 64x64 block to align the reference before choosing partitioning in speed 6. The overall CPU cycle cost due to this additional 64x64 block match (SSE2 version) takes around 2% at low bit-rate rtc speed 6. When strong motion activities exist in the video sequence, it substantially improves the partition selection accuracy, thereby achieving better compression performance and lower CPU cycles. The experiments were tested in RTC speed -6 setting: cloud 1080p 500 kbps 17006 b/f, 37.086 dB, 5386 ms -> 16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster) pedestrian_area 1080p 500 kbps 53537 b/f, 36.771 dB, 18706 ms -> 51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings) blue_sky 1080p 500 kbps 70214 b/f, 33.600 dB, 13979 ms -> 53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster) jimred 400 kbps 13380 b/f, 36.014 dB, 5723 ms -> 13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower) Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c	2015-02-19 13:47:19 -08:00
James Zern	0dd591bedd	loop_filter_rows_mt: remove dependency on 'last_height' using this to control reallocation would miss a change if the function were not called for every frame. fixes potential memory corruption by the subsequent memset Change-Id: I4c6bb6ab68803104fc824c7e27cc2f9b2cf53e33	2015-02-13 19:11:23 -08:00
Yunqing Wang	238707ab4c	Merge "Make vp9_print_modes_and_motion_vectors() work"	2015-02-11 16:58:52 -08:00
James Zern	d8ed558c99	Merge "vp9_thread: prefer pthread.h if available"	2015-02-11 16:50:07 -08:00
James Zern	923cc0bf51	vp9_highbd_tm_predictor_16x16: fix win64 by saving xmm8; cglobal's xmm reg arg is 0-based Change-Id: Ic8426ec9ac59ab4478716aa812452a6406794dcb	2015-02-10 19:34:12 -08:00
Yunqing Wang	f37788eaf6	Make vp9_print_modes_and_motion_vectors() work MODE_INFO struct was modified, and vp9_print_modes_and_motion_vectors() didn't work anymore. This patch modified vp9_debugmodes.c so that this function works again for debug usage. Change-Id: I293fae0295235deb2529a460a274caf7c045ac1a	2015-02-10 16:37:02 -08:00
James Zern	d167a1aeee	vp9_thread: prefer pthread.h if available this avoids conflicts with recent versions of mingw-w64 (tested g++ 4.8.2) and the unit tests Change-Id: Ic41ea31eebe0e3e712ed5e657f37d8cad6712088	2015-02-10 12:47:14 -08:00
Yunqing Wang	84b813aa42	Merge "Make encoder and decoder share common thread function"	2015-02-10 09:06:41 -08:00
Yunqing Wang	d3a37731c2	Merge "Rename loopfilter_thread files to thread_common files"	2015-02-10 09:06:23 -08:00
hkuang	dd88f48296	Set the maximum decode threads to be 8. This will fix the frame parallel decode hang on windows due to not enough semaphores. This will also make the frame parallel decode safer as the number of frame buffers could only support maximum 8 threads. Change-Id: Id9ef50692819dcbebbd74a0aabffbfb3f39a4309	2015-02-09 10:38:41 -08:00
Yunqing Wang	4ae092c660	Make encoder and decoder share common thread function Moved vp9_accumulate_frame_counts to vp9_thread_common.c to eliminate the duplicate code. Change-Id: I9cf506d729603c8bf1494b4c86a3b7d47af1917a	2015-02-06 11:45:51 -08:00
Yunqing Wang	41063137c3	Rename loopfilter_thread files to thread_common files Renames the files to allow more common thread code to be moved to vp9/common. Change-Id: I7386e64e221086e3cdc087e79812f993c423413b	2015-02-06 10:03:31 -08:00
Jingning Han	0c6d3a03e1	Account for chroma component costs in RTC mode decision This commit allows the encoder to account for additional chroma plane costs in the mode decision process, if the current block potentially contains significant color change. It improves the visual quality at very low bit-rates. The compression performance of dark720p is improved by 12.39% in speed 6. For jimred at 150 kbps, the PSNR of V component (red) increased by 0.2 dB, at the expense of about 5% increase in encoding time. Note that for sequences where the chroma components are fairly consistent, the encoding time increase is negligible. On average the rtc set compression performance is improved by 1.172% in PSNR and 1.920% in SSIM. Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c	2015-02-04 09:45:14 -08:00
hkuang	70554a21f1	Merge "Remove duplicate code."	2015-02-03 13:37:48 -08:00
Jim Bankoski	9f1cf2c8cf	make low bitrates a lot less blocky Remove loop filter skip at speed 7+ because of bad visual artifacts and up the postprocessing. Change-Id: Ibdd0bac71aaee232d2bb2e14462733c51517768d	2015-02-03 06:45:56 -08:00
Yaowu Xu	80e729f601	Merge "Optimize coef update"	2015-02-01 20:08:29 -08:00
hkuang	be6aeadaf4	Try again to merge branch 'frame-parallel' into master branch. In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. Current frame parallel decode will only speed up the decoding for frame parallel encoded videos. For non frame parallel encoded videos, frame parallel decode is slower than serial decode due to lack of loopfilter worker thread. There are still some known issues that need to be addressed. For example: decode frame parallel videos with segmentation enabled is not right sometimes. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c This reverts commit `a18da9760a`. Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02	2015-01-30 21:00:13 -08:00
James Zern	f6c2a6c5d6	vp9: rename 'near' parameters + nearest for consistency near is a reserved word in windows builds so using it as a parameter name may cause build failures with some configurations Change-Id: Iddf1d4ecdb39843f14e95dbfd9dca55f07f81403	2015-01-30 15:52:24 -08:00
Yaowu Xu	45971abd1d	Optimize coef update 1. move the check of search method of USE_TX_8X8 up one level to avoid operations of build_tree_distributions() 2. count tx used and avoid computaton for coef udpate when one size is not used at all. Change-Id: Ia3e54a2588aa531c41377a1bfaa64385d04a592c	2015-01-30 10:16:40 -08:00
hkuang	e8c42fb0bd	Remove duplicate code. (issue #934). Change-Id: Ic8adaaff87aae0b33d9b508f160b48e0ccdaaf4c	2015-01-28 12:00:34 -08:00
Frank Galligan	e3167f7fbf	Add vp9_sad32x32x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~18% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c	2015-01-27 08:54:00 -08:00
Frank Galligan	9f574d0316	Add vp9_sad16x16x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~15% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9	2015-01-27 08:42:17 -08:00
Frank Galligan	54fa956715	Add vp9_sad64x64x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: Id12af7d1883243c23e6692e898aea82299633d58	2015-01-27 08:33:40 -08:00
Frank Galligan	9f6eba419a	Add Neon intrinsic vp9_fdct8x8_quant_neon On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51	2015-01-24 22:49:50 -08:00
Yaowu Xu	643c75d90b	Merge "Replace divide with look-up"	2015-01-23 21:12:18 -08:00
JackyChen	65f60f8e8c	Merge "SSE2 code for the filter in MFQE."	2015-01-23 11:08:16 -08:00
Yaowu Xu	eda179764f	Replace divide with look-up This commit replaces an integer divide with a table-lookup. It is to improve decoding speed, and at the same time, to reduce possible complications with a bug in AMD Family 12h processors: "665 Integer Divide Instruction May Cause Unpredictable Behavior" Change-Id: I678b707a538798a923850bac467e66e847e6def7	2015-01-23 09:02:07 -08:00
Johann	a18da9760a	Revert "Merge branch 'frame-parallel' to enable frame parallel decode in master branch." This reverts commit `bde04ce503` Change-Id: I053dae04c761b04a36dc239558503905a14d2470	2015-01-23 08:42:02 -08:00
hkuang	bde04ce503	Merge branch 'frame-parallel' to enable frame parallel decode in master branch. In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. VP9 frame parallel decode is >30% faster than serial decode with tile parallel threading which will makes devices play 1080P VP9 videos more easily. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64	2015-01-22 18:18:53 -08:00
Frank Galligan	469ff48d7b	Merge "Add Neon intrinsics for vp9_avg_8x8_neon"	2015-01-20 14:38:39 -08:00
Yunqing Wang	6d7b7abf52	Add non420 code in multi-threaded loopfilter Added non420 part back to make it consistent with single thread code in vp9_loopfilter.c. Change-Id: I8ca255d73bffebae294d2627d6655eafe535cb90	2015-01-20 09:31:47 -08:00
JackyChen	09673deba9	SSE2 code for the filter in MFQE. The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8 side. In our testing, we achieve 2X speed by adopting this change. Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716	2015-01-18 16:07:59 -08:00
Yunqing Wang	e76eaf05b1	vp9_ethread: add parallel loopfilter 1. Added row-based loopfilter in encoder; 2. Moved common multi-threaded loopfilter functions from decoder to common; 3. Merged multi-threaded loopfilter code, and made encoder/ decoder call same function to reduce code duplication. Encoder tests showed that 1% - 2% speedup was seen for good-quality 2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6% speedup using 4 threads were seen for real-time mode(at speed 7). Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4	2015-01-16 17:19:27 -08:00
Frank Galligan	6e7e1cf32f	Add Neon intrinsics for vp9_avg_8x8_neon On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee	2015-01-15 15:32:40 -08:00
Yaowu Xu	829a01dbb7	Merge "Add encoder control for setting color space"	2015-01-14 14:14:34 -08:00
Yaowu Xu	e94b415c34	Add encoder control for setting color space This commit adds encoder side control for vp9 to set color space info in the output compressed bitstream. It also amends the "vp9_encoder_params_get_to_decoder" test to verify the correct color space information is passed from the encoder end to decoder end. Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650	2015-01-14 10:17:14 -08:00
Frank Galligan	ec1d8387e1	Add 64x64 sub_pel_variance Neon function On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334	2015-01-14 08:36:24 -08:00
Frank Galligan	74d40cd507	Add 64x variance Neon functions Add optimized Neon functions of: vp9_variance32x64 vp9_variance64x32 vp9_variance64x64 On Nexus 7 speed -5 and -6 saw about a 4% increase in perf. Speeds -7 and -8 saw about a 6% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa	2015-01-13 15:08:13 -08:00
James Zern	4d6838627d	Merge "vp9: add per-tile longjmp error handling"	2015-01-08 15:53:37 -08:00
Johann	00bbe342c2	Merge "Disable vp9 _8_ loopfilters"	2015-01-08 12:47:52 -08:00
Yaowu Xu	01eec75858	Merge "Refactor calculation of tile_cols"	2015-01-07 16:24:57 -08:00
JackyChen	1883c940b9	Merge "Use qdiff to adjust the threshold of sad and variance in MFQE."	2015-01-07 14:57:46 -08:00
Yaowu Xu	e9cf9b7dfe	Refactor calculation of tile_cols Change-Id: I2c38ea2bcf6d221a0b6b2fb9be4cebbee21006a3	2015-01-07 14:28:59 -08:00
JackyChen	60cf5cf7b2	Use qdiff to adjust the threshold of sad and variance in MFQE. When qdiff is larger, the sad/variance threshold should also be higher which indicates a more aggressive action on MFQE. Change-Id: I44c5c93572805458d4f87fdc7619cc9d8a522185	2015-01-07 09:07:10 -08:00
Johann	377b6682f9	Disable vp9 _8_ loopfilters Investigating https://code.google.com/p/chromium/issues/detail?id=443839 Change-Id: Ibb7485d835c5aa5e1d40f31715596ba8d208eedb	2015-01-06 19:26:11 -08:00
Johann	b1ba4cc394	Rearrange loopfilter functions Separate functions and rename files. This will make it easier to disable some functions later to help work around a compiler issue in chromium. Change-Id: I7f30e109f77c4cd22e2eda7bd006672f090c1dc5	2015-01-06 19:26:11 -08:00
Deb Mukherjee	0c2ee67ad6	Merge "Enable coefficient range checking for 10-/12-bit"	2015-01-06 14:59:08 -08:00
Deb Mukherjee	0ce2a27e9b	Enable coefficient range checking for 10-/12-bit Also fixes a broken build with --enable-coefficient-range-checking configuration option. Change-Id: Icc536f53088e8cec59dfb8f635668555fdb9125e	2015-01-06 02:40:51 -08:00
JackyChen	fe23539d58	Adopt weighted averaging in MFQE. By using weighted averaging in the calculation of the frames to be displayed, we get an average gain of more than 1 db for key frames whose base qp are 20 higher than non-key frames. Change-Id: I7bcb2e7b9c6420ea3f73f33204d18b072dffd17c	2015-01-05 11:38:42 -08:00

1 2 3 4 5 ...

2699 Commits