generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	afcb62b414	WIP: Use LUT for y_dequant/uv_dequant instead of calculating every block. Change-Id: Ib19ff2546be8441f8755ae971ba2910f29412029	2015-04-28 07:52:06 -07:00
Yunqing Wang	297b2b99de	Fix debugmodes file to print modes and MVs correctly This patch fixed the issues in debugmodes file because of the recent changes in MODE_INFO struct. Change-Id: I4df83379ecc887c1f009d4a8329c9809c5b299d6	2015-04-27 17:09:38 -07:00
Parag Salasakar	1c9af9833d	Merge "mips msa vp9 convolve8 horiz optimization"	2015-04-21 22:08:25 -07:00
Johann	931c0a954f	Merge "Rename neon convolve avg file"	2015-04-21 15:45:29 -07:00
Johann	66b9933b8d	Rename neon convolve avg file Some build systems use just the basename for object files. Change-Id: I333e1107ee866f3906cc46476ef8d04c6200a8a0	2015-04-21 14:18:17 -07:00
Scott LaVarnway	8b17f7f4eb	Revert "Remove mi_grid_* structures." (see I3a05cf1610679fed26e0b2eadd315a9ae91afdd6) For the test clip used, the decoder performance improved by ~2%. This is also an intermediate step towards adding back the mode_info streams. Change-Id: Idddc4a3f46e4180fbebddc156c4bbf177d5c2e0d	2015-04-21 11:16:45 -07:00
Parag Salasakar	ca90d4fd96	mips msa vp9 convolve8 horiz optimization average improvement ~6x-8x Change-Id: I7c91eec41aada3b0a5231dda7869b3b968f3ad18	2015-04-21 12:31:26 +05:30
Parag Salasakar	ef51c1ab5b	mips msa vp9 convolve8 hv optimization average improvement ~5x-8x Change-Id: I3214734cb3716e742907ce0d2d7a042d953df82b	2015-04-21 09:17:49 +05:30
Parag Salasakar	2e36149ccd	Merge "mips msa vp9 convolve8 vert optimization"	2015-04-18 23:39:25 -07:00
Parag Salasakar	27d083c1b9	mips msa vp9 convolve8 vert optimization average improvement ~6x-10x Change-Id: Ie3f3ab3a9005be84935919701e56b404e420affa	2015-04-18 08:13:04 +05:30
Marco Paniconi	f76ccce5bc	Revert "Revert "Force_split on 16x16 blocks in variance partition."" This reverts commit `004b9d83e3` Change-Id: I2f2d0bdb9368c2c07f1d29a69cd461267a3a8743	2015-04-16 17:52:13 -07:00
Johann	14ef4aeafb	Reorganize *_rtcd() calling conventions Change-Id: Ib1e17d8aae9b713b87f560ab5e49952ee2bfdcc2	2015-04-15 11:12:05 -04:00
Yunqing Wang	004b9d83e3	Revert "Force_split on 16x16 blocks in variance partition." This reverts commit `eb8c667570`. The patch caused mismatch while using multi-threads. Change-Id: Icd646340af25b5d91e32f03ed3ea212e00e3e0be	2015-04-14 15:19:31 -07:00
Marco	eb8c667570	Force_split on 16x16 blocks in variance partition. Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks. Also increase variance threshold for 32x32, and add exit condiiton in choose_partition (with very safe threshold) based on sad used to select reference frame. Some visual improvement near moving boundaries. Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%. Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip. Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577	2015-04-13 12:05:07 -07:00
Parag Salasakar	2f693be8f8	Merge "mips msa vp9 common headers added"	2015-04-09 21:50:15 -07:00
Jingning Han	93d9c50419	Merge "SSSE3 assembly implementation of 8x8 Hadamard transform"	2015-04-09 11:16:11 -07:00
Parag Salasakar	481fb7640c	mips msa vp9 common headers added Change-Id: Ia31ada59172eb1818e1eb91009f83cbb1f581223	2015-04-09 15:35:12 +05:30
Jingning Han	7f629dfca4	SSSE3 assembly implementation of 8x8 Hadamard transform It uses about 10% less CPU cycles than the SSE2 intrinsic implementation. Change-Id: I91017c0c068679a214b98cdd4cff3a6facfb7499	2015-04-04 09:59:37 -07:00
James Zern	44e3640923	Merge "vp9: enable sse4 sad functions"	2015-04-03 14:57:52 -07:00
James Zern	b644384bb5	Merge "vp9: fix high-bitdepth NEON build"	2015-04-01 23:36:17 -07:00
Yaowu Xu	54210f706c	Merge "use MAX_MB_PLANE consistently"	2015-04-01 18:24:39 -07:00
Yaowu Xu	f26b8c84f8	use MAX_MB_PLANE consistently Change-Id: Ic416a7f145001a88f5a7f70dde9b1edbc1b69381	2015-04-01 15:21:20 -07:00
Jingning Han	1470529f62	Refactor block_yrd function for RTC coding mode This commit separates Hadamard transform/quantization operations from rate and distortion computation in block_yrd. This allows one to skip SATD computation when all transform blocks are quantized to zero. It also uses a new block error function that skips repeated computation of sum of squared residuals. It reduces the CPU cycles spent on block error calculation in block_yrd by 40%. Change-Id: I726acb2454b44af1c3bd95385abecac209959b10	2015-04-01 12:00:43 -07:00
James Zern	14e24a1297	vp9: enable sse4 sad functions sse4 isn't set by configure or used in rtcd, correct the sad entries to use sse4_1 without changing the signatures for now. this was done in vp8 post-vp9 branch. Change-Id: Ia9f1fff9f2476fdfa53ed022778dd2f708caa271	2015-03-31 21:00:55 -07:00
James Zern	8845334097	vp9: fix high-bitdepth NEON build remove incorrect specializations in rtcd and update a configuration check in partial_idct_test.cc Change-Id: I20f551f38ce502092b476fb16d3ca0969dba56f0	2015-03-31 17:45:25 -07:00
hui su	d4f2f1dd5b	Merge "Move vp9_coef_con_tree to common/"	2015-03-31 10:51:10 -07:00
Jingning Han	db5ec37edc	Merge "Enable 16x16 Hadamard transform in SATD based mode decision"	2015-03-31 09:55:41 -07:00
hui su	302e24cb3e	Move vp9_coef_con_tree to common/ This tree should be defined in common/, as it is needed for both encoder and decoder. Change-Id: I4f5cbc80025cf2ced14182c98f7c82dc7d0f87db	2015-03-31 09:20:46 -07:00
Jingning Han	26d3d3af6a	Enable 16x16 Hadamard transform in SATD based mode decision This commit replaces the 16x16 2D-DCT transform with Hadamard transform for RTC coding mode. It reduces the CPU cycles cost on 16x16 transform by 5X. Overall it makes the speed -6 encoding speed 1.5% faster without compromise on compression performance. Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b	2015-03-30 15:43:31 -07:00
Jingning Han	f0ac5aaa08	Merge "Hadamard transform based coding mode decision process"	2015-03-30 15:43:15 -07:00
Jingning Han	8c411f74e0	Hadamard transform based coding mode decision process This commit uses Hadamard transform based rate-distortion cost estimate for rtc coding mode decision. It improves the compression performance of speed -6 for many hard clips at lower bit-rates. For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for niklas720p. This will introduce extra encoding cycle costs at this point. Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375	2015-03-30 14:46:05 -07:00
jackychen	68610ae568	vp9_postproc.c: eliminate -Wshadow build warnings. Change-Id: I6df525a9ad1ae3cfbba8710d21db8fee76e64dbb	2015-03-27 20:27:30 -07:00
Alex Converse	a1e20ec58f	Refactor fast loop filter code to handle 444. Change-Id: I921b1ebabdf617049f8fa26fbe462c3ff115c1ce	2015-03-24 11:17:50 -07:00
hkuang	9f4f98fdbd	Merge "Optimize the intra frame decode to skip some unnecessary copy."	2015-03-23 16:50:37 -07:00
hkuang	85107641a4	Optimize the intra frame decode to skip some unnecessary copy. This speeds up a normal YT style 1080P clip decode by ~1% on nexus 7. Change-Id: Ied7fa0d8bc941b2adb4db9382f549ee4d5654f3a	2015-03-23 10:11:49 -07:00
hkuang	b88dac8938	Safely free all the frame buffers after all the workers finish the work. Issue: 978 Change-Id: Ia7aa809095008f6819a44d7ecb0329def79b1117	2015-03-19 12:21:00 -07:00
Yaowu Xu	73508be364	Fix a typo introduced in #94401aff This fixes all test vector failures Change-Id: Ie1a9fe0f023f7a0c7e89eb55df1b40ff65302adc	2015-03-12 08:01:08 -07:00
hkuang	4a691aa209	Merge "Refactor the block decode code to make it simpler."	2015-03-11 16:19:14 -07:00
hkuang	94401aff5c	Refactor the block decode code to make it simpler. Change-Id: I0f983cb821ad7ec6fbefe7895cb8124a8fa39df6	2015-03-11 11:37:16 -07:00
Yunqing Wang	f0cf9719d0	Accumulate tx_totals counters in multi-threaded encoder Tx_totals counters weren't handled correctly in multi-thread case, which caused the mismatch while encoding using threads > 1. This patch fixed that. Change-Id: Ice9b0386f57175fb92a0bdcd5042686a3106246a	2015-03-10 10:02:49 -07:00
Hangyu Kuang	a1ef75bb63	Merge "Only wait for previous frame's motion vector if needed."	2015-03-06 10:27:26 -08:00
Hangyu Kuang	d5fa786b4f	Only wait for previous frame's motion vector if needed. Change-Id: Iecce685a33b64844446c0009f21bc85566d7469f	2015-03-05 16:09:44 -08:00
Johann	42eb97eb91	Declare function used by 'once' with 'void' parameters Visual Studio is exceptionally picky about this: vp9_reconintra.c(900): warning C4113: 'void (__cdecl )()' differs in parameter lists from 'void (__cdecl )(void)' [.build-x86_64-win64-vs10\vpx.vcxproj] Change-Id: I564c7415f4608fd962be8c699d6133a996b545f7	2015-03-04 15:34:55 -08:00
Adrian Grange	3807dd82ab	Make encoder buffer allocation dynamic Frame buffers are now allocated dynamically on-demand. Entries in the reference frame map, cm->ref_frame_map, may now be set to -1 (INVALID_IDX) to indicate that there is not a valid reference buffer in that "slot". All slots in the reference frame map are now initialized to the empty state (-1) and each buffer is initialized to have a reference count of 0. Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582	2015-03-04 07:58:32 -08:00
Yunqing Wang	55639c383b	fix a race condition caused by intra function pointer initialization This patch fixed webm issue 962. (https://code.google.com/p/webm/issues/detail?id=962) The data races occurred when an encoder and a decoder were created at the same time, and the function pointers were initialized twice. Change-Id: I8851b753c4b4ad4767d6eea781b61f0ac9abb44b	2015-03-03 09:58:37 -08:00
Jingning Han	1790d45252	Use variance metric for integral projection vector match This commit replaces the SAD with variance as metric for the integral projection vector match. It improves the search accuracy in the presence of slight light change. The average speed -6 compression performance for rtc set is improved by 1.7%. No speed changes are observed for the test clips. Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d	2015-03-01 10:42:56 -08:00
Jingning Han	c4cb8059ff	Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 4"	2015-02-27 09:49:10 -08:00
Jingning Han	43bb97f7d0	Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 3"	2015-02-27 09:49:00 -08:00
Jingning Han	4800b0e80d	Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 2"	2015-02-27 09:48:51 -08:00
Jingning Han	8ec22296b3	Fix high bit-depth loop-filter sse2 compiling issue - part 3 Change-Id: Idb14b9a285f8098126f967c5e2750221d6a58f69	2015-02-26 15:21:22 -08:00
Jingning Han	14ff1cb74a	Fix high bit-depth loop-filter sse2 compiling issue - part 2 Change-Id: I6728b69bb3dff1daa64ff7142f691e80a089f1c4	2015-02-26 12:41:19 -08:00
Jingning Han	2080e4b206	Fix high bit-depth loop-filter sse2 compiling issue - part 1 The intrinsic statement _mm_subs_epi16() should take immediate. Feeding variable as its input argument will cause compile failure in older version gcc. Change-Id: I6a71efcc8d3b16b84715e0a9bcfa818494eea3f4	2015-02-25 09:59:50 -08:00
James Zern	044bfa3949	Merge "vp9_loopfilter: quiet integer constant size warnings"	2015-02-24 19:09:32 -08:00
Jingning Han	5b87f1bb5a	Fix high bit-depth loop-filter sse2 compiling issue - part 4 Change-Id: I39f56f60425836f2e1ec07da71edd4810a4c78bb	2015-02-24 14:50:30 -08:00
James Zern	279d350f0b	vp9_loopfilter: quiet integer constant size warnings mark uint64_t constants with 'ULL' Change-Id: I7648e161b4004fba35e1fa7ab79e34cc19e39716	2015-02-24 11:13:16 -08:00
Hangyu Kuang	8724d31d12	Move dequant table from VP9_COMMON to VP9_COMP as decoder does not need it any more. This reduces VP9_COMMON size from 25776 bytes to 17584 bytes(~31%). Change-Id: Ic5daea732ccefb6d512b048af7983f0efe08589b	2015-02-20 11:12:42 -08:00
Jingning Han	216b171d63	Merge "Integral projection based motion estimation"	2015-02-19 15:08:11 -08:00
Jingning Han	ed2dc59c1b	Integral projection based motion estimation This commit introduces a new block match motion estimation using integral projection measurement. The 2-D block and the nearby region is projected onto the horizontal and vertical 1-D vectors, respectively. It then runs vector match, instead of block match, over the two separate 1-D vectors to locate the motion compensated reference block. This process is run per 64x64 block to align the reference before choosing partitioning in speed 6. The overall CPU cycle cost due to this additional 64x64 block match (SSE2 version) takes around 2% at low bit-rate rtc speed 6. When strong motion activities exist in the video sequence, it substantially improves the partition selection accuracy, thereby achieving better compression performance and lower CPU cycles. The experiments were tested in RTC speed -6 setting: cloud 1080p 500 kbps 17006 b/f, 37.086 dB, 5386 ms -> 16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster) pedestrian_area 1080p 500 kbps 53537 b/f, 36.771 dB, 18706 ms -> 51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings) blue_sky 1080p 500 kbps 70214 b/f, 33.600 dB, 13979 ms -> 53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster) jimred 400 kbps 13380 b/f, 36.014 dB, 5723 ms -> 13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower) Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c	2015-02-19 13:47:19 -08:00
James Zern	0dd591bedd	loop_filter_rows_mt: remove dependency on 'last_height' using this to control reallocation would miss a change if the function were not called for every frame. fixes potential memory corruption by the subsequent memset Change-Id: I4c6bb6ab68803104fc824c7e27cc2f9b2cf53e33	2015-02-13 19:11:23 -08:00
Yunqing Wang	238707ab4c	Merge "Make vp9_print_modes_and_motion_vectors() work"	2015-02-11 16:58:52 -08:00
James Zern	d8ed558c99	Merge "vp9_thread: prefer pthread.h if available"	2015-02-11 16:50:07 -08:00
James Zern	923cc0bf51	vp9_highbd_tm_predictor_16x16: fix win64 by saving xmm8; cglobal's xmm reg arg is 0-based Change-Id: Ic8426ec9ac59ab4478716aa812452a6406794dcb	2015-02-10 19:34:12 -08:00
Yunqing Wang	f37788eaf6	Make vp9_print_modes_and_motion_vectors() work MODE_INFO struct was modified, and vp9_print_modes_and_motion_vectors() didn't work anymore. This patch modified vp9_debugmodes.c so that this function works again for debug usage. Change-Id: I293fae0295235deb2529a460a274caf7c045ac1a	2015-02-10 16:37:02 -08:00
James Zern	d167a1aeee	vp9_thread: prefer pthread.h if available this avoids conflicts with recent versions of mingw-w64 (tested g++ 4.8.2) and the unit tests Change-Id: Ic41ea31eebe0e3e712ed5e657f37d8cad6712088	2015-02-10 12:47:14 -08:00
Yunqing Wang	84b813aa42	Merge "Make encoder and decoder share common thread function"	2015-02-10 09:06:41 -08:00
Yunqing Wang	d3a37731c2	Merge "Rename loopfilter_thread files to thread_common files"	2015-02-10 09:06:23 -08:00
hkuang	dd88f48296	Set the maximum decode threads to be 8. This will fix the frame parallel decode hang on windows due to not enough semaphores. This will also make the frame parallel decode safer as the number of frame buffers could only support maximum 8 threads. Change-Id: Id9ef50692819dcbebbd74a0aabffbfb3f39a4309	2015-02-09 10:38:41 -08:00
Yunqing Wang	4ae092c660	Make encoder and decoder share common thread function Moved vp9_accumulate_frame_counts to vp9_thread_common.c to eliminate the duplicate code. Change-Id: I9cf506d729603c8bf1494b4c86a3b7d47af1917a	2015-02-06 11:45:51 -08:00
Yunqing Wang	41063137c3	Rename loopfilter_thread files to thread_common files Renames the files to allow more common thread code to be moved to vp9/common. Change-Id: I7386e64e221086e3cdc087e79812f993c423413b	2015-02-06 10:03:31 -08:00
Jingning Han	0c6d3a03e1	Account for chroma component costs in RTC mode decision This commit allows the encoder to account for additional chroma plane costs in the mode decision process, if the current block potentially contains significant color change. It improves the visual quality at very low bit-rates. The compression performance of dark720p is improved by 12.39% in speed 6. For jimred at 150 kbps, the PSNR of V component (red) increased by 0.2 dB, at the expense of about 5% increase in encoding time. Note that for sequences where the chroma components are fairly consistent, the encoding time increase is negligible. On average the rtc set compression performance is improved by 1.172% in PSNR and 1.920% in SSIM. Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c	2015-02-04 09:45:14 -08:00
hkuang	70554a21f1	Merge "Remove duplicate code."	2015-02-03 13:37:48 -08:00
Jim Bankoski	9f1cf2c8cf	make low bitrates a lot less blocky Remove loop filter skip at speed 7+ because of bad visual artifacts and up the postprocessing. Change-Id: Ibdd0bac71aaee232d2bb2e14462733c51517768d	2015-02-03 06:45:56 -08:00
Yaowu Xu	80e729f601	Merge "Optimize coef update"	2015-02-01 20:08:29 -08:00
hkuang	be6aeadaf4	Try again to merge branch 'frame-parallel' into master branch. In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. Current frame parallel decode will only speed up the decoding for frame parallel encoded videos. For non frame parallel encoded videos, frame parallel decode is slower than serial decode due to lack of loopfilter worker thread. There are still some known issues that need to be addressed. For example: decode frame parallel videos with segmentation enabled is not right sometimes. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c This reverts commit `a18da9760a`. Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02	2015-01-30 21:00:13 -08:00
James Zern	f6c2a6c5d6	vp9: rename 'near' parameters + nearest for consistency near is a reserved word in windows builds so using it as a parameter name may cause build failures with some configurations Change-Id: Iddf1d4ecdb39843f14e95dbfd9dca55f07f81403	2015-01-30 15:52:24 -08:00
Yaowu Xu	45971abd1d	Optimize coef update 1. move the check of search method of USE_TX_8X8 up one level to avoid operations of build_tree_distributions() 2. count tx used and avoid computaton for coef udpate when one size is not used at all. Change-Id: Ia3e54a2588aa531c41377a1bfaa64385d04a592c	2015-01-30 10:16:40 -08:00
hkuang	e8c42fb0bd	Remove duplicate code. (issue #934). Change-Id: Ic8adaaff87aae0b33d9b508f160b48e0ccdaaf4c	2015-01-28 12:00:34 -08:00
Frank Galligan	e3167f7fbf	Add vp9_sad32x32x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~18% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c	2015-01-27 08:54:00 -08:00
Frank Galligan	9f574d0316	Add vp9_sad16x16x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~15% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9	2015-01-27 08:42:17 -08:00
Frank Galligan	54fa956715	Add vp9_sad64x64x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: Id12af7d1883243c23e6692e898aea82299633d58	2015-01-27 08:33:40 -08:00
Frank Galligan	9f6eba419a	Add Neon intrinsic vp9_fdct8x8_quant_neon On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51	2015-01-24 22:49:50 -08:00
Yaowu Xu	643c75d90b	Merge "Replace divide with look-up"	2015-01-23 21:12:18 -08:00
JackyChen	65f60f8e8c	Merge "SSE2 code for the filter in MFQE."	2015-01-23 11:08:16 -08:00
Yaowu Xu	eda179764f	Replace divide with look-up This commit replaces an integer divide with a table-lookup. It is to improve decoding speed, and at the same time, to reduce possible complications with a bug in AMD Family 12h processors: "665 Integer Divide Instruction May Cause Unpredictable Behavior" Change-Id: I678b707a538798a923850bac467e66e847e6def7	2015-01-23 09:02:07 -08:00
Johann	a18da9760a	Revert "Merge branch 'frame-parallel' to enable frame parallel decode in master branch." This reverts commit `bde04ce503` Change-Id: I053dae04c761b04a36dc239558503905a14d2470	2015-01-23 08:42:02 -08:00
hkuang	bde04ce503	Merge branch 'frame-parallel' to enable frame parallel decode in master branch. In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. VP9 frame parallel decode is >30% faster than serial decode with tile parallel threading which will makes devices play 1080P VP9 videos more easily. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64	2015-01-22 18:18:53 -08:00
Frank Galligan	469ff48d7b	Merge "Add Neon intrinsics for vp9_avg_8x8_neon"	2015-01-20 14:38:39 -08:00
Yunqing Wang	6d7b7abf52	Add non420 code in multi-threaded loopfilter Added non420 part back to make it consistent with single thread code in vp9_loopfilter.c. Change-Id: I8ca255d73bffebae294d2627d6655eafe535cb90	2015-01-20 09:31:47 -08:00
JackyChen	09673deba9	SSE2 code for the filter in MFQE. The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8 side. In our testing, we achieve 2X speed by adopting this change. Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716	2015-01-18 16:07:59 -08:00
Yunqing Wang	e76eaf05b1	vp9_ethread: add parallel loopfilter 1. Added row-based loopfilter in encoder; 2. Moved common multi-threaded loopfilter functions from decoder to common; 3. Merged multi-threaded loopfilter code, and made encoder/ decoder call same function to reduce code duplication. Encoder tests showed that 1% - 2% speedup was seen for good-quality 2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6% speedup using 4 threads were seen for real-time mode(at speed 7). Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4	2015-01-16 17:19:27 -08:00
Frank Galligan	6e7e1cf32f	Add Neon intrinsics for vp9_avg_8x8_neon On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee	2015-01-15 15:32:40 -08:00
Yaowu Xu	829a01dbb7	Merge "Add encoder control for setting color space"	2015-01-14 14:14:34 -08:00
Yaowu Xu	e94b415c34	Add encoder control for setting color space This commit adds encoder side control for vp9 to set color space info in the output compressed bitstream. It also amends the "vp9_encoder_params_get_to_decoder" test to verify the correct color space information is passed from the encoder end to decoder end. Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650	2015-01-14 10:17:14 -08:00
Frank Galligan	ec1d8387e1	Add 64x64 sub_pel_variance Neon function On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334	2015-01-14 08:36:24 -08:00
Frank Galligan	74d40cd507	Add 64x variance Neon functions Add optimized Neon functions of: vp9_variance32x64 vp9_variance64x32 vp9_variance64x64 On Nexus 7 speed -5 and -6 saw about a 4% increase in perf. Speeds -7 and -8 saw about a 6% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa	2015-01-13 15:08:13 -08:00
James Zern	4d6838627d	Merge "vp9: add per-tile longjmp error handling"	2015-01-08 15:53:37 -08:00
Johann	00bbe342c2	Merge "Disable vp9 _8_ loopfilters"	2015-01-08 12:47:52 -08:00
Yaowu Xu	01eec75858	Merge "Refactor calculation of tile_cols"	2015-01-07 16:24:57 -08:00
JackyChen	1883c940b9	Merge "Use qdiff to adjust the threshold of sad and variance in MFQE."	2015-01-07 14:57:46 -08:00
Yaowu Xu	e9cf9b7dfe	Refactor calculation of tile_cols Change-Id: I2c38ea2bcf6d221a0b6b2fb9be4cebbee21006a3	2015-01-07 14:28:59 -08:00
JackyChen	60cf5cf7b2	Use qdiff to adjust the threshold of sad and variance in MFQE. When qdiff is larger, the sad/variance threshold should also be higher which indicates a more aggressive action on MFQE. Change-Id: I44c5c93572805458d4f87fdc7619cc9d8a522185	2015-01-07 09:07:10 -08:00
Johann	377b6682f9	Disable vp9 _8_ loopfilters Investigating https://code.google.com/p/chromium/issues/detail?id=443839 Change-Id: Ibb7485d835c5aa5e1d40f31715596ba8d208eedb	2015-01-06 19:26:11 -08:00
Johann	b1ba4cc394	Rearrange loopfilter functions Separate functions and rename files. This will make it easier to disable some functions later to help work around a compiler issue in chromium. Change-Id: I7f30e109f77c4cd22e2eda7bd006672f090c1dc5	2015-01-06 19:26:11 -08:00
Deb Mukherjee	0c2ee67ad6	Merge "Enable coefficient range checking for 10-/12-bit"	2015-01-06 14:59:08 -08:00
Deb Mukherjee	0ce2a27e9b	Enable coefficient range checking for 10-/12-bit Also fixes a broken build with --enable-coefficient-range-checking configuration option. Change-Id: Icc536f53088e8cec59dfb8f635668555fdb9125e	2015-01-06 02:40:51 -08:00
JackyChen	fe23539d58	Adopt weighted averaging in MFQE. By using weighted averaging in the calculation of the frames to be displayed, we get an average gain of more than 1 db for key frames whose base qp are 20 higher than non-key frames. Change-Id: I7bcb2e7b9c6420ea3f73f33204d18b072dffd17c	2015-01-05 11:38:42 -08:00
Jim Bankoski	b3c66f8a2f	WIP: Remove giant value cost table Change-Id: Iabe8a8868a747626c24bb13f1796f4c7827af367	2014-12-23 15:06:17 -08:00
Jingning Han	d0f2377027	Revert "Revert "Removal of legacy zbin_extra / zbin_oq_value."" This reverts commit `9946ee23e0`. Fix the ssse3 asm function. Change-Id: I07f77a63aa98087626e45c4e87aa5dcafc0b0b07	2014-12-22 10:09:25 -08:00
James Zern	953dd1894d	vp9: add per-tile longjmp error handling this avoids longjmp'ing from another thread on error which will cause undesired behavior Change-Id: Ic9074ed8cc4243944bf2539d6e482f213f4e8c86	2014-12-19 11:50:04 -08:00
Paul Wilkins	9946ee23e0	Revert "Removal of legacy zbin_extra / zbin_oq_value." This reverts commit `e9b586e21b`. Change-Id: I5b36e6727da6c05278d97e2c37b80c109f79bed4	2014-12-19 15:02:58 +00:00
Paul Wilkins	8ac3f9adaa	Merge "Removal of legacy zbin_extra / zbin_oq_value."	2014-12-19 03:37:02 -08:00
James Zern	b32ba09d35	Merge "make vp9 encoder static initializers thread safe"	2014-12-18 18:48:30 -08:00
Jim Bankoski	cd60930814	make vp9 encoder static initializers thread safe Change-Id: If2d0888d13ebe52bc7c3b16f16319408a86ab6de	2014-12-18 15:50:46 -08:00
Paul Wilkins	e9b586e21b	Removal of legacy zbin_extra / zbin_oq_value. zbin extra / zbin_oq_value was widely passed around, hence removal touches a lot of code. Change-Id: Idc94359735b60c38a160e4385ae09d5ca8b6b8e5	2014-12-18 16:49:11 +00:00
hkuang	b7166143d0	Let YUV plane share the same dqcoeff buffer. Remove unnecessary dqcoeff from macroblockd which reduce macroblockd size by 16384 bytes. Change-Id: Ia379a703b4fee81c8fd4698b52488a85a90c9bc2	2014-12-17 18:29:07 -08:00
JackyChen	9bc7974552	Merge "Add rectangle block support for MFQE."	2014-12-17 15:10:02 -08:00
JackyChen	021e244a51	Merge "Use bit_depth in VP9Common as the flag of highbit."	2014-12-17 09:30:32 -08:00
Jingning Han	00d2211929	Merge "Remove reset mode_info array per frame"	2014-12-17 09:24:44 -08:00
JackyChen	b363cedcd1	Use bit_depth in VP9Common as the flag of highbit. Change-Id: I881aefbe68f9c10bb4629a2a5ee1e42a225d5ab7	2014-12-16 21:45:01 -08:00
James Yu	aeeaa67987	VP9 common for ARMv8 by using NEON intrinsics 15 Re-write - vp9_lpf_horizontal_4_dual_neon in vp9_loopfilter_16_neon.c Change-Id: Ie14f63d352f9564ad01db3939a61d91cf6d21a31 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 20:00:26 -08:00
Johann	ebc1951c7c	Merge "Use defines for inline and __builtin_prefetch"	2014-12-16 18:04:04 -08:00
JackyChen	9931070094	Add rectangle block support for MFQE. Only for the rectangle blocks larger than 16X16, SAD and Variance are still based on the internal square blocks. Change-Id: I3754da1b0254147313f86a0140dbf4f980f06a5a	2014-12-16 16:35:54 -08:00
Johann	4f7060a431	Merge "VP9 common for ARMv8 by using NEON intrinsics 16"	2014-12-16 16:15:48 -08:00
Jingning Han	ccdc448b70	Remove reset mode_info array per frame The mode_info array was unnecessarily reset to zero every frame when error resilient mode turned on, given that the mode info values per block will be assigned during mode search stage. This commit removes this reset operation. It reduces the runtime cost on memset operation to 1/3. The overall speed -6 runtime is reduced by 2%. Change-Id: I32ecb73338d8995cc0c5147de09357364f13d45b	2014-12-16 15:54:24 -08:00
Johann	2fdbf70d40	Use defines for inline and __builtin_prefetch These were established for compatibility. Make sure to use them. Most frequently they manifest as issues on Visual Studio builds. Change-Id: I39d764d2eb341b999d7a6132cb44b2acfc511160	2014-12-16 15:21:19 -08:00
Frank Galligan	5fdd0f1fe0	Merge "Revert "Revert "Add support for setting byte alignment."""	2014-12-16 15:14:17 -08:00
James Yu	aa8dd897c1	VP9 common for ARMv8 by using NEON intrinsics 16 Add vp9_reconintra_neon.c - vp9_v_predictor_4x4_neon - vp9_v_predictor_8x8_neon - vp9_v_predictor_16x16_neon - vp9_v_predictor_32x32_neon - vp9_h_predictor_4x4_neon - vp9_h_predictor_8x8_neon - vp9_h_predictor_16x16_neon - vp9_h_predictor_32x32_neon - vp9_tm_predictor_4x4_neon - vp9_tm_predictor_8x8_neon - vp9_tm_predictor_16x16_neon - vp9_tm_predictor_32x32_neon Change-Id: Ib5d54a4766a1b5127169045659974f33aa98376d Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 12:57:52 -08:00
James Yu	ba05a4c640	VP9 common for ARMv8 by using NEON intrinsics 19 Delete vp9_dc_only_idct_add_neon.c The function was merged with vp9_short_idct4x4_1_add (later vp9_idct4x4_1_add) in `d2de1ca` and should have been deleted then. Change-Id: Ie58ba3dd9dc7330a8f1238dd7dd71c9ed4639b94 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 11:14:12 -08:00
JackyChen	e7bad92689	Fixed MFQE crash issue for highbit depth. Check the flags, no MFQE for highbit now. Will add highbit support latter. Change-Id: I548c27593e0f47ab7f4c92b45f14fb037dc86591	2014-12-16 10:07:38 -08:00
Yaowu Xu	b60ae45f36	Merge "Prevent decoder from using uninitialized entropy context."	2014-12-16 09:30:24 -08:00
Jim Bankoski	abc5a66770	Merge "Fix the comments."	2014-12-16 06:25:01 -08:00
Johann	1d059fa23e	Merge "VP9 common for ARMv8 by using NEON intrinsics 06"	2014-12-15 14:49:33 -08:00
Johann	37ea1e1218	Merge "VP9 common for ARMv8 by using NEON intrinsics 05"	2014-12-15 14:48:53 -08:00
Frank Galligan	c4f7079ad4	Revert "Revert "Add support for setting byte alignment."" This reverts commit `91471d6aad`. Fixes the compile issues if post_proc is enabled. Change-Id: Ib40a15ce2c194f9b5adfa65a17ab01ddf60f5a59	2014-12-15 12:20:37 -08:00
James Yu	4f856cd7fa	VP9 common for ARMv8 by using NEON intrinsics 06 Add vp9_iht8x8_add_neon.c - vp9_iht8x8_64_add_neon The assembly did not previously implement tx_type 0 BUG=716 Change-Id: Icfc99dd24f3d59047f9184a7d0c761ba7e3de934 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-15 12:18:06 -08:00
James Yu	6b71013277	VP9 common for ARMv8 by using NEON intrinsics 05 Add vp9_iht4x4_add_neon.c - vp9_iht4x4_16_add_neon The assembly did not previously implement tx_type 0 BUG=715 Change-Id: I60034d1568de034edba45c5cdd13f3d87dbc73b6 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-15 12:16:19 -08:00
James Zern	8d558f2ca5	Merge "vp9/MACROBLOCKD: reorder struct members"	2014-12-15 11:54:51 -08:00
Paul Wilkins	91471d6aad	Revert "Add support for setting byte alignment." Fails to compile. Bad calls to vp9_alloc_frame_buffer and vp9_realloc_frame_buffer in postproc.c This reverts commit `399823b6f5`. Change-Id: I29f0e173f8e185d3a303cfdb17813e1eccb51e3a	2014-12-15 11:54:13 +00:00
James Zern	c58c579ec4	vp9/MACROBLOCKD: reorder struct members improves locality of reference Change-Id: I0639b98bf38879f918173b3a1b25dd93090e88b4	2014-12-12 18:01:24 -08:00
Frank Galligan	9c2601eb68	Merge "Add support for setting byte alignment."	2014-12-12 15:47:11 -08:00
James Zern	89ee8923a8	Merge "Remove redundant loads on 1d16_v8 filter."	2014-12-12 14:32:52 -08:00
James Zern	f82d7fd854	Merge "Remove redundant loads on 1d8_v8 filter."	2014-12-12 14:32:26 -08:00
James Zern	4d40a046da	Merge "vp9: move encoder-only member from common"	2014-12-12 14:28:55 -08:00
James Zern	2bf4b4852f	Merge changes Id6421838,I37499329 * changes: vp9: make postproc members depend on CONFIG_VP9_POSTPROC vp9_postproc: remove redundant CONFIG_* checks	2014-12-12 14:27:56 -08:00
Frank Galligan	399823b6f5	Add support for setting byte alignment. Add support for setting byte alignment on the Y, U, and V plane of the reference buffers. The byte alignment must be a power of 2, from 32 to 1024. A value of 0 sets legacy alignment. Change-Id: I7c1399622f7aa68e123646369216b32047dda73d	2014-12-12 13:34:36 -08:00
Frank Galligan	6a24dbd71f	Remove redundant loads on 1d16_v8 filter. This CL showed about a 3% gain in performance on some systems. Change-Id: Id27e7e0b8e69068aa364e67859436da852669250	2014-12-12 11:48:47 -08:00
Frank Galligan	44ee777905	Remove redundant loads on 1d8_v8 filter. This CL showed a modest gain in performance on some systems. Change-Id: Iad636a89a1a9804ab7a0dea302bf2c6a4d1653a4	2014-12-12 11:34:24 -08:00
James Zern	72ece1308b	vp9: move encoder-only member from common allow_comp_inter_inter VP9_COMMON -> VP9_COMP Change-Id: I6d9dc25d1cdd7e2ab62f5be69cd9fa883d21dbb6	2014-12-12 11:17:44 -08:00
James Zern	ef06de33fe	vp9: make postproc members depend on CONFIG_VP9_POSTPROC Change-Id: Id64218386968cee3132269e4a0572650f20fd980	2014-12-12 11:17:17 -08:00
James Zern	890f7bedf3	vp9_postproc: remove redundant CONFIG_* checks the entire module is wrapped in CONFIG_VP9_POSTPROC which is forcibly enabled with CONFIG_INTERNAL_STATS + a similar change in vp9_alloccommon.c Change-Id: I374993297a9fba5bef2f0b71f984eba42f0995a3	2014-12-12 11:17:16 -08:00
James Zern	d456ccbc9d	vp9_loopfilter_mmx: remove some unused tables Change-Id: I964d25cc91c8e4864d73b142d9c7a1b39cb6cfbb	2014-12-12 11:16:24 -08:00
JackyChen	3425d6c83e	Merge "Multiframe Quality Enhancement(MFQE) in VP9."	2014-12-11 16:24:08 -08:00
Alexander Voronov	6c6a97814f	Prevent decoder from using uninitialized entropy context. If decoding starts with intra-only frame, there is a possibility of using uninitialized entropy context, what leads to undefined behavior. Change-Id: Icbb64b5b1bd1e5de2a4bfa2884e56bc0a20840af	2014-12-11 20:44:19 +03:00
Peter de Rivaz	5c22224e9e	Corrected optimization of 8x8 DCT code The 8x8 DCT uses a fast version whenever possible. There was a mistake in the checking code which meant sometimes the fast version was used when it was not safe to do so. Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7 (cherry picked from commit `fd05fb0c21`)	2014-12-11 09:42:57 -08:00
JackyChen	7ac3e3c1d6	Multiframe Quality Enhancement(MFQE) in VP9. It is the first version of MFQE in VP9. There are a few TODOs included in this version. Usage: Add flag --enable-vp9-postproc to config the project. In decoder, use flag --mfqe in the command line to enable MFQE in postproc. Note: Need to have key frame with low quality to see the effect of this new patch. In my experiment, I fixed the qindex to 200 in key frame. Change-Id: I021f9ce4616ed3574c81e48d968662994b56a396	2014-12-11 09:19:39 -08:00
James Yu	3f7c12dab9	VP9 common for ARMv8 by using NEON intrinsics 18 Add vp9_idct32x32_add_neon.c - vp9_idct32x32_1024_add_neon Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:20:04 -08:00
James Yu	3cfed4bf76	VP9 common for ARMv8 by using NEON intrinsics 14 Add vp9_idct16x16_add_neon.c - vp9_idct16x16_256_add_neon_pass1 - vp9_idct16x16_256_add_neon_pass2 - vp9_idct16x16_10_add_neon_pass1 - vp9_idct16x16_10_add_neon_pass2 Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:19:54 -08:00
James Yu	ce76aeb00d	VP9 common for ARMv8 by using NEON intrinsics 13 Add vp9_idct8x8_add_neon.c - vp9_idct8x8_64_add_neon - vp9_idct8x8_10_add_neon Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:56:54 -08:00
James Yu	8c25f4af6a	VP9 common for ARMv8 by using NEON intrinsics 12 Add vp9_idct4x4_add_neon.c - vp9_idct4x4_16_add_neon Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:49:59 -08:00
James Yu	420f58f2d2	VP9 common for ARMv8 by using NEON intrinsics 11 Add vp9_idct16x16_1_add_neon.c - vp9_idct16x16_1_add_neon Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:06:58 -08:00
James Yu	030ca4d0e5	VP9 common for ARMv8 by using NEON intrinsics 10 Add vp9_idct32x32_1_add_neon.c - vp9_idct32x32_1_add_neon Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:04:29 -08:00
James Yu	2772b45ac0	VP9 common for ARMv8 by using NEON intrinsics 09 Add vp9_idct8x8_1_add_neon.c - vp9_idct8x8_1_add_neon Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:52:33 -08:00
James Yu	9114f0afdb	VP9 common for ARMv8 by using NEON intrinsics 08 Add vp9_idct4x4_1_add_neon.c - vp9_idct4x4_1_add_neon Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:49:28 -08:00
James Yu	01fc6f51e0	VP9 common for ARMv8 by using NEON intrinsics 07 Add vp9_convolve8_neon.c - vp9_convolve8_horiz_neon - vp9_convolve8_vert_neon Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	893534a996	VP9 common for ARMv8 by using NEON intrinsics 04 Add vp9_convolve8_avg_neon.c - vp9_convolve8_avg_horiz_neon - vp9_convolve8_avg_vert_neon Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	d12757f5c6	VP9 common for ARMv8 by using NEON intrinsics 03 Add vp9_copy_neon.c - vp9_convolve_copy_neon Change-Id: I291fc5423d06240876411bbceab03eae5ef585be Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:02:46 -08:00
Scott LaVarnway	617382a2e3	VP9 common for ARMv8 by using NEON intrinsics 02 Add vp9_avg_neon.c - vp9_convolve_avg_neon Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 19:00:21 -08:00
hkuang	4eee74d6ed	Fix clang ioc warning due to NULL src_mi pointer. The warning only happens in VP9 encoder's first pass due to src_mi is not set up yet. But it will not fail the encoder as left_mi and above_mi are not used in the first_pass and they will be set up again in the second pass. Change-Id: I12dffcd5fb1002b2b2dabb083c8726650e4b5f08	2014-12-09 14:32:48 -08:00
James Yu	5b098b1825	VP9 common for ARMv8 by using NEON intrinsics 01 Add vp9_loopfilter_neon.c - vp9_lpf_horizontal_4_neon - vp9_lpf_vertical_4_neon - vp9_lpf_horizontal_8_neon - vp9_lpf_vertical_8_neon Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 12:26:56 -08:00
Yunqing Wang	cddbdeabd0	Merge "SSSE3 Optimization for Atom processors using new instruction selection and ordering"	2014-12-08 13:34:54 -08:00
James Zern	c38d0490b3	Merge "Changes to assembler for NASM on mac."	2014-12-08 12:55:06 -08:00
hkuang	81e5cb86d3	Fix the comments. Change-Id: I9789476865a1b24dad54115d8f7edb4fed780b90	2014-12-08 12:44:09 -08:00
levytamar82	8f9d94ec17	SSSE3 Optimization for Atom processors using new instruction selection and ordering The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors. By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved. In the original code, the PSHUBF uses every byte and is consecutively copied. This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result. For example: filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7 REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15 PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors. There was no observed performance impact on Core processors (expected). Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0	2014-12-08 13:11:01 -07:00
hkuang	f925e5ce0f	Merge "Improve the performance by caching the left_mi and right_mi in macroblockd."	2014-12-08 10:24:17 -08:00
hkuang	382f86f945	Improve the performance by caching the left_mi and right_mi in macroblockd. This improve the deocde performance by ~2% on Nexus 7 2013. Change-Id: Ie9c4ba0371a149eb7fddc687a6a291c17298d6c3	2014-12-05 16:25:42 -08:00
hkuang	eaa6deee5b	Merge "Merge set_prev_mi function into encoder function."	2014-12-05 15:12:50 -08:00
Peter de Rivaz	a306bd8274	Use the RTC optimizations when in high bitdepth mode. Change 72193 made the encoder behave differently when configured with and without high bitdepth. This change means the same algorithm is used for both. Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5	2014-12-04 15:48:42 -08:00
hkuang	62de07c8c6	Merge set_prev_mi function into encoder function. Change-Id: Ifcf2efbb232ea4cabcdebbe77e0820d121e4a6da	2014-12-04 14:44:23 -08:00
Marco	8fd3f9a2fb	Enable non-rd mode coding on key frame, for speed 6. For key frame at speed 6: enable the non-rd mode selection in speed setting and use the (non-rd) variance_based partition. Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames), mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16. Loss in key frame quality (~0.6-0.7dB) compared to rd coding, but speeds up key frame encoding by at least 6x. Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6. Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405	2014-12-03 09:18:08 -08:00
Peter de Rivaz	7e40a55ef9	Added high bitdepth sse2 transform functions Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit `005d80cd05`) (cherry picked from commit `08d2f54800`) (cherry picked from commit `4230c2306c`)	2014-12-02 11:16:24 -08:00
Alex Converse	0496d11486	Fix a tautological assert. Change-Id: I90ad08823e1d038384536fa9f458caadc2c87f38	2014-11-24 15:01:01 -08:00
Debargha Mukherjee	e9d9f1adab	Merge "Refactored idct routines and headers"	2014-11-24 12:47:03 -08:00
John Stark	71379b87df	Changes to assembler for NASM on mac. fixes non-Apple nasm part of issue #755 Change-Id: I11955d270c4ee55e3c00e99f568de01b95e7ea9a	2014-11-24 12:00:50 -08:00
Peter de Rivaz	3a8c43a479	Refactored idct routines and headers This change is made in preparation for a subsequent patch which adds acceleration for the highbitdepth transform functions. The highbitdepth transform functions attempt to use 16/32bit sse instructions where possible, but fallback to using the C implementations if potential overflow is detected. For this reason the dct routines are made global so they can be called from the acceleration functions in the subsequent patch. Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665 (cherry picked from commit `454342d4e7`)	2014-11-24 09:57:40 -08:00
Debargha Mukherjee	02355a4abf	Merge "Added highbitdepth sse2 acceleration for quantize"	2014-11-21 16:08:47 -08:00
Peter de Rivaz	a7b2d09f36	Added highbitdepth sse2 acceleration for quantize Also includes block error. (This patch is mostly cherry picked from commit `db7192e0b0`) Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78	2014-11-19 23:55:19 -08:00
Jingning Han	c42715b721	Enable ssse3 version of vp9_fdct8x8_quant It improves the speed performance of vp9_fdct8x8_quant_sse2 by about 5%. Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097	2014-11-19 22:14:19 -08:00
Jingning Han	bf63652d34	Merge "Combine fdct8x8 and quantization process"	2014-11-19 11:17:44 -08:00
Jingning Han	ce77a7bcb0	Merge "Add sse2 version for vp9_quantize_fp"	2014-11-19 11:17:36 -08:00
Jingning Han	c6908fd5f7	Combine fdct8x8 and quantization process This commit reworks the forward transform and quantization process for 8x8 block coding. It combines the two operations in a single function to save a store/load stage of the original transform coefficients. Overall the speed -6 is slightly faster (around 1% range). The compression performance of speed -6 is improved by 3.4%. Change-Id: Id6628daef123f3e4649248735ec2ad7423629387	2014-11-18 18:10:56 -08:00
Jingning Han	2d3cc8ea2b	Add sse2 version for vp9_quantize_fp vp9_quantize_fp is the quantization process used by rtc coding mode. This commit adds a sse2 implementation of it. The implementation is modified based on vp9_quantize_b_sse2. No speed difference from ssse3 version. Change-Id: I24949c5b27df160b4f35117d28858d269454e64a	2014-11-18 09:01:41 -08:00
Yaowu Xu	1687c47bfd	change to call vp9_refining_search_sad() directly The function pointer in compressor instance does not change, so this commit changes to call the function directly. Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5	2014-11-17 11:30:17 -08:00
Peter de Rivaz	48032bfcdb	Added sse2 acceleration for highbitdepth variance Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f (cherry picked from commit `d7422b2b1e`) (cherry picked from commit `6d741e4d76`)	2014-11-14 15:18:53 -08:00
Debargha Mukherjee	002172efd6	Merge "Added highbitdepth sse2 SAD acceleration and tests"	2014-11-12 21:20:34 -08:00
Peter de Rivaz	7eee487c00	Added highbitdepth sse2 SAD acceleration and tests Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f (cherry picked from commit `b1a6f6b9cb`)	2014-11-12 14:25:45 -08:00
Deb Mukherjee	cc57c5e4af	Iadst transforms to use internal low precision Change-Id: I266777d40c300bc53b45b205144520b85b0d6e58 (cherry picked from commit `a1b726117f`)	2014-11-07 14:19:45 -08:00
Yaowu Xu	98492c1091	Merge "Change the use of a reserved color space entry"	2014-11-07 06:24:59 -08:00
Yaowu Xu	af3519a385	Change the use of a reserved color space entry This commit rename a reserved color space entry to BT_2020, it intends to provide support for VP9 bitstream to pass along the color space type defined in BT.2020(Rec.2020) please note this entry does not have any effect on encoding/decoding behavior, but allow applications to the pass the information along from encoding end to decoding end. Change-Id: I4678520e89141ea5e8900f7bd1c0e95b710b7091	2014-11-06 19:14:21 -08:00
Yunqing Wang	1228433430	Modify the frame context memory deallocation This patch was to fix the vpxdec fuzzing3 test failure. When an error occurs, setjmp() is invoked, which calls the decoder removing routine. In multiple thread situation, other threads could try to access the frame context memory that is already deallocated, thus causing a segfault. An invalid unit test was added for this issue. Change-Id: Ida7442154f3d89759483f0f4fe0324041fffb952	2014-11-06 11:34:19 -08:00
hkuang	e8860693ea	Merge "Totally remove prev_mi in VP9 decoder."	2014-11-05 17:48:47 -08:00
hkuang	4cc7c5a17f	Totally remove prev_mi in VP9 decoder. This will save the memory and improve the decode speed due to removing unnecessary memset of big prev_mi array for all the key frames. Decoding a all key frames 1080p video shows speed improve around 2%. Change-Id: I6284a445c1291056e3c15135c3c20d502f791c10	2014-11-05 16:14:30 -08:00
Yaowu Xu	2c4fee17bc	Fix visual studio 2013 compiler warnings For configured with --enable-vp9-highbitdepth Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6	2014-11-05 13:47:28 -08:00
hkuang	23da920a8e	Fix the memory leak due to missing free frame_mvs. Change-Id: I2ceee7341d906259002c0ea31ea009ae32c04bfd	2014-11-04 13:28:31 -08:00
Yunqing Wang	6d90a9d289	Merge "WORKAROUND FIX FOR GCC4.9.1"	2014-11-03 16:56:38 -08:00
levytamar82	86175a5788	WORKAROUND FIX FOR GCC4.9.1 In the function mb_lpf_horizontal_edge_w_avx2_16 the usage of the intrinsic _mm256_cvtepu8_epi16 cause a compiler bug in gcc 4.9.1. until it will be fixed I created a workaround that create the up convert by using broadcast128+shuffle. The bug was reported here: https://code.google.com/p/webm/issues/detail?id=867 Change-Id: I73452e6806f42e0fadcde96b804ea3afa7eeb351	2014-11-01 11:27:28 -07:00
hkuang	55577431ae	Bind motion vectors with frame buffer structure. This will save a lot of memory for decoder due to removing of prev_mi, but prev_mi is still needed in encoder. So this will increase a little bit memory for encoder. Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed	2014-10-31 17:01:08 -07:00
Hui Su	d478d2df37	Merge "Move the definition of switchable filter numbers into enum INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV."	2014-10-30 11:05:04 -07:00
James Zern	01900edc40	Merge changes I8a9c9019,Ic7b2faa3,I44d42a50,I3f3a3924,I10747b32,I31b49c9e * changes: add vp9_loop_filter_data_reset move LFWorkerData allocation to VP9LfSync vp9_loop_filter_frame_mt: remove pbi dependency vp9_loop_filter_frame_mt: pass planes directly vp9_loop_filter_frame_mt: pass VP9LfSync directly vp9: store TileWorkerData allocations separately	2014-10-24 11:43:51 -07:00
James Zern	01483677e5	add vp9_loop_filter_data_reset Change-Id: I8a9c9019242ec10fa499a78db322221bf96a0275	2014-10-23 19:43:48 +02:00
Yunqing Wang	330a6b2756	Merge "vp9_ethread: allocate frame contexts outside VP9_COMMON struct"	2014-10-22 17:10:39 -07:00
Yunqing Wang	7c7e4d4eb8	vp9_ethread: allocate frame contexts outside VP9_COMMON struct This patch allocated frame contexts outside VP9_COMMON. This allows multiple threads to share the same copy of frame contexts, and reduces the overhead. It also guarantees the correct update of these contexts during bitstream packing. This patch doesn't change encoding result. Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353	2014-10-22 15:03:12 -07:00
Frank Galligan	95a568b3a8	Fix Neon convolve profiling When profiling, gprof can't distinguish between matching labels in different files. Change-Id: I56770df212ed314a0d8568071fa8157624ef1e8f	2014-10-22 10:51:53 -07:00
Hangyu Kuang	9ce3a7d76c	Implement frame parallel decode for VP9. Using 4 threads, frame parallel decode is ~3x faster than single thread decode and around 30% faster than tile parallel decode for frame parallel encoded video on both Android and desktop with 4 threads. Decode speed is scalable to threads too which means decode could be even faster with more threads. Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e	2014-10-22 10:50:58 -07:00
Hui Su	8947b18fa3	Move the definition of switchable filter numbers into enum INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV. Change-Id: Ic36c9eb6ccb8ec324d991f7241e42b40b60b1dcb	2014-10-21 15:41:37 -07:00
Yunqing Wang	687c56e802	Merge "SAD32xh and SAD64xh for AVX2"	2014-10-20 12:37:55 -07:00
levytamar82	7045aec00a	SAD32xh and SAD64xh for AVX2 All sad function that process above 32 consecutive elements are optimized for AVX2: vp9_sad64x64 vp9_sad64x32 vp9_sad32x64 vp9_sad32x32 vp9_sad32x16 vp9_sad64x64_avg vp9_sad64x32_avg vp9_sad32x64_avg vp9_sad32x32_avg vp9_sad32x16_avg The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64 vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90% both of them gave and overall ~2.3% user level gain Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd	2014-10-19 13:59:10 -07:00
Peter de Rivaz	73ae6e495c	Add highbitdepth function for vp9_avg_8x8 Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/ (`a92f987a6b`) on highbitdepth branch. Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8	2014-10-17 17:04:37 -07:00
James Zern	e9b8810b4d	move LFWorkerData allocation to VP9LfSync this removes an assumption that worker->data1 would be pointing to a TileWorkerData allocation. additionally, within the multi-threaded loopfilter pass VP9LfSync as a parameter to the worker hook, removing the need for a shadow pointer in LFWorkerData. Change-Id: Ic7b2faa34e3eb59dbcb8a7c67f333448fa047c88	2014-10-16 18:55:46 +02:00
Alex Converse	00a9671bbd	Merge "Add a 32-bit friendly sse2 quantizer."	2014-10-14 14:35:02 -07:00
Alex Converse	7497d2fb23	Add a 32-bit friendly sse2 quantizer. This is based on the 64-bit ssse3 quantizer. 1.1x speedup for screen content at speed 7. Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448	2014-10-14 11:37:41 -07:00
hkuang	c38a8edf16	Merge "Remove extra line."	2014-10-14 11:05:01 -07:00
Adrian Grange	f7c336aa19	Merge "Remove mi_grid_base_array from VP9_COMMON (unused)"	2014-10-14 07:50:17 -07:00
hkuang	c5fd035ce0	Use pre increment. Change-Id: I016b4e77d8268e189473f4c382603afe1ae1750f	2014-10-13 14:07:03 -07:00
Adrian Grange	83b63d573a	Remove mi_grid_base_array from VP9_COMMON (unused) Change-Id: I4b4764463f5a7cdc01ec004b882c6237466c74b0	2014-10-13 11:54:05 -07:00
hkuang	dbe91de6d4	Remove extra line. Change-Id: I5e79c276d8953ae17cd35b2846e6e40660c037c3	2014-10-10 14:59:04 -07:00
hkuang	effc1a6f56	Correct the code format. Change-Id: If2de420f8123a4e8bf635dd29205dd74ee174eee	2014-10-09 17:57:45 -07:00
Deb Mukherjee	9a29fdbae7	Merge "Rename highbitdepth functions to use highbd prefix"	2014-10-09 15:39:56 -07:00
Deb Mukherjee	1929c9b391	Rename highbitdepth functions to use highbd prefix Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e	2014-10-09 14:40:40 -07:00
James Zern	caa0f81914	vp9_rtcd_defs: fix vp9_avg_8x8 declaration vp9_avg_8x8 does not depend on x86inc, fixes 32-bit OS X build Change-Id: I709b874ea84bf57c8cdb5ac7d43eecc6b8c1a2dd	2014-10-09 10:44:42 +02:00
Jingning Han	f6ff752c63	Merge "Clean up header files in vp9_blockd.h and related files"	2014-10-08 15:25:09 -07:00
Jingning Han	1c3398675f	Merge "Use #define statement for MAX_MB_PLANE"	2014-10-08 15:24:56 -07:00
Jim Bankoski	20254d1daa	Merge "experimental : partition using 1/8 x 1/8 image"	2014-10-08 09:04:26 -07:00
Jim Bankoski	0ce51d823f	experimental : partition using 1/8 x 1/8 image The concept: There's too much noise in source pixels for variance and at low bitrate the reconstructed looks nothing like the source so we have problems getting good partitionings with either. This skirts the issue by using a box blur scaled down version for variance calculations. To compare against source_var_ moved keyframe to be rd based like source_var. Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624	2014-10-07 16:36:14 -07:00
Jingning Han	608c4acc1f	Merge "Remove vp9_blockd.h from vp9_common_data.c"	2014-10-07 15:34:07 -07:00
Jingning Han	3bbec7b422	Merge "Replace mi_width_log2() with mi_width_log2_lookup table"	2014-10-07 15:33:52 -07:00
Jingning Han	27c9577f8e	Merge "Take out repeated block width/height lookup functions"	2014-10-07 15:33:45 -07:00
Jingning Han	6ad272cb84	Clean up header files in vp9_blockd.h and related files This commit breaks the overly broad header files into more targeted and smaller ones, to help better structure the system layout. Change-Id: I7b24559d3ea6e582cf5d9bbe8f71459f9824d71b	2014-10-07 15:17:10 -07:00
Jingning Han	3c28fb768d	Use #define statement for MAX_MB_PLANE Change-Id: I3a7f83ab1dbfcedc8a82fe798c2fa30dd9c7d696	2014-10-07 15:00:22 -07:00
Jingning Han	d7febaf5c5	Remove extra empty line Change-Id: I6f2865bb8ba9295f5c45a4cad065aecbe1e63c32	2014-10-07 14:06:54 -07:00
Jingning Han	bd9706506f	Merge "Move inter filter defs to vp9_filter.h"	2014-10-07 13:42:26 -07:00
Jingning Han	ebd724852e	Remove vp9_blockd.h from vp9_common_data.c The basic data defs should be above block operation level. Change-Id: I7dd9836d01120ab75e0c472baac9f15495ed0db5	2014-10-07 13:02:54 -07:00
Jingning Han	7ee58985bd	Replace mi_width_log2() with mi_width_log2_lookup table Change-Id: If0ea98aa139d14d40cd924114e18396aff36b5a5	2014-10-07 12:45:25 -07:00
Jingning Han	b66f7016c1	Take out repeated block width/height lookup functions The functions b_width_log2 and b_height_log2 only do direct table fetch. This commit unifies such use cases by using the table directly and removes these functions. Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b	2014-10-07 12:33:07 -07:00
Jingning Han	5d9cdac087	Move inter filter defs to vp9_filter.h Add comments on the use case of these definitions. Further reduce the scope of header file in vp9_context_tree.h. Change-Id: Ic4a7638e838d0ac441b64abfc56e57354c059d75	2014-10-07 12:16:37 -07:00
Deb Mukherjee	cfc337aae8	Merge "Resolves some static analysis / undefined warnings"	2014-10-07 12:15:26 -07:00
Deb Mukherjee	fced63ed30	Resolves some static analysis / undefined warnings Also fixes a case of distortion becoming negative and messing up the RDCOST computation. Change-Id: Id345af9e8dfff31ade622be5756e51f2cdface53	2014-10-07 11:20:56 -07:00
JackyChen	a9f479682a	Merge "Add SSE2 code and unit test for VP9 denoiser."	2014-10-07 10:51:55 -07:00
JackyChen	80465dae88	Add SSE2 code and unit test for VP9 denoiser. This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are only 16x16 blocks in denoiser, while in VP9, there are 13 different block sizes. By adding this SSE2 code, the improvement of encoder speed is around 20%(using C code vs using SSE2 code), vary for different clips. The unit test for VP9 denoiser is to confirm that the SSE2 code is bit-exact with the C code. The unit test covers all block size. Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d	2014-10-06 15:27:40 -07:00
Jingning Han	12344f2697	Add range check in inverse ADST 16x16 Bit-stream clarification related to Issue 868. Change-Id: I92a7bc5b7782c9ea5c3f6cceec761742183c9514	2014-10-06 11:07:58 -07:00
Deb Mukherjee	3bcc2af8cd	Some data type changes in vp9_idct.c Resolves a visual studio warning, and includes some cleanups. Change-Id: I6a7576ef323c475b7d1c659800cd82c6cb1fd18d	2014-10-04 16:03:04 -07:00
Deb Mukherjee	8a01074d04	Merge "Incorporate WRAPLOW macro into non-highbitdepth tx"	2014-10-03 12:45:39 -07:00
Deb Mukherjee	d50716face	Incorporate WRAPLOW macro into non-highbitdepth tx Incorporates the WRAPLOW macro into the non-highbitdepth transforms to aid hardware verification between a software C model and an intended hardware implementation though the use of the configure options: --enable-experimental --enable-emulate-hardware. Note that to avoid further discrepancies between the sse/sse2 implementations of the transforms and the C implementation, when the emulate hardware option is invoked, we also disable sse/sse2/etc. Also incudes some minor cleanups/renaming etc. Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287	2014-10-03 11:38:05 -07:00
Yaowu Xu	f809475c73	Merge "Make iscan and scan neighbor arrays static const."	2014-10-02 15:15:58 -07:00
Yaowu Xu	9712bc691d	Make iscan and scan neighbor arrays static const. This commit changes the tables to be read only, which fixes issue #866 Change-Id: I85bbe03f9d344f50570f8c1c61699bdc5cee248f	2014-10-02 14:08:14 -07:00
Alexander Voronov	befc36d4a7	Fix invalid memory access in inter prediction (issue 853). Change-Id: I5a566d6ade720f212a60c0ad5d6f1ee1d1d37f2e	2014-10-02 18:57:47 +04:00
Jingning Han	c7d719325e	Merge "Remove redundant header file from vp9_idct.h"	2014-10-01 17:05:36 -07:00
Deb Mukherjee	30fbf23fda	Merge "High-bitdepth bugfixes"	2014-10-01 16:47:43 -07:00
Jingning Han	74c2997bc9	Remove redundant header file from vp9_idct.h Change-Id: Id92544762e7b96d3c729dfc8e04ecff91cbcc7f9	2014-10-01 14:58:27 -07:00
Deb Mukherjee	a160d72522	High-bitdepth bugfixes Miscellaneous bug-fixes for high bitdepth functionality. With this patch, high bit-depth profiles become mostly functional, except for an intermittent assert failure issue that is being tracked. Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c	2014-10-01 14:18:11 -07:00
Jingning Han	3d17f0d45f	Remove repeated vpx_integer.h from vp9_prob.h The file vpx_integer.h has been included and used in the parent file vp9_common.h. Change-Id: I9c65f08353576f9ef1e5ea17244fc5ca964ec002	2014-10-01 12:45:52 -07:00
Jingning Han	764c00ab50	Use precise header files in vp9_entropymv.h The commit cleans up the header files in vp9_entropymv.h. This file should only depend on vp9_mv.h and vp9_prob.h. Remove the giant vp9_blockd.h from header file list. Change-Id: I44cd26d2cfd10a16a9325778347dd53f888a874c	2014-10-01 12:41:08 -07:00
Deb Mukherjee	872b207b78	Moves transform type defines to vp9_common Moves transform type defines to vp9_common.h from vp9_idct.h so that they can be included in vp9_rtcd_defs.pl safely. Change-Id: Id5106227bee5934f7ce8b06f2eb9fa8a9a2e0ddb	2014-09-30 19:44:17 -07:00
James Zern	4a296e6baa	Revert "Fix compiling error in vp9_idct.h" This reverts commit `eafc8c9c40`. tran_low_t/tran_high_t don't belong in a public header, they're private. Similarly the public headers shouldn't rely on config defines, vpx_config.h isn't installed. Change-Id: I194ec273598da418df8dd727b6c0e78a556740ad	2014-09-30 16:08:55 -07:00
Jingning Han	0829d2be7f	Remove redundant header file declaration Some header file in vp9_idct.c has been included in vp9_idct.h. This commit removes these redundant declarations. Change-Id: I0238c27e4efff5c981eb437022c6bc6970c4e445	2014-09-30 09:13:00 -07:00
Jingning Han	eafc8c9c40	Fix compiling error in vp9_idct.h This commit fixes a compiling error in vp9_idct.h, where the codec checks that the intermediate steps of transformation fit within 16-bit length. The issue was due to broken file dependency. Change-Id: Ib22bba13a1e6df28489cb23d6774c561969f1fdc	2014-09-30 09:11:59 -07:00
Deb Mukherjee	9ed23de13f	Miscellaneous decoder changes for high bitdepth Also includes yv12 config changes. Change-Id: Iacf40d8bf486815b54c32a127ce3cd4516b7e44f	2014-09-29 11:27:45 -07:00
hkuang	c53a95ad1d	Avoid calling vp9_is_scaled two times in a function. Use a local variable to hold the result of vp9_is_scaled. Change-Id: I5e203909805923e20eefef596bc84424da47dbe2	2014-09-25 11:52:16 -07:00
Yaowu Xu	845d4f333d	Fix a couple of comments The first comment is obselete given the way is now normative in VP9 bitstream. The second comment line was too long. Change-Id: I6546585babf60d466485ddcf2daa6d2fa79e999a	2014-09-25 08:24:16 -07:00
Yaowu Xu	d237d483a5	Correct the condition for border extension As reported in issue #850, the condition for border extension was not complete. This commit added the case when the scaling is enabled. This fixes issue #850. Change-Id: I67768b23f0dcc4ac9a9aa0a0825b0fe8cb85a72e	2014-09-24 11:26:40 -07:00
Yaowu Xu	148c57d231	Merge "Fix invalid memory access on 2x downscale."	2014-09-24 09:58:05 -07:00
Alexander Voronov	eafd842a3e	Fix incorrect subsampling used in VP9 non420 loopfilter. Change-Id: Ia959e24b4676242c80a8867d2c39a6fee90f71a5	2014-09-24 17:01:09 +04:00
Deb Mukherjee	e2a90c0b21	Merge "High bit-depth loop/arf/postproc filter functions"	2014-09-23 17:26:32 -07:00
Deb Mukherjee	931ed516ba	High bit-depth loop/arf/postproc filter functions Adds high-bitdepth loopfilter, temporal filter and postproc functions Change-Id: I81c8a9176890784686bc4f2af0d550d243b3b2d3	2014-09-23 16:20:43 -07:00
hkuang	c70cea97ac	Remove mi_grid_* structures. mi_grid_* are arrays of pointer to pointer. They save the pointers that point to the MIs in cm->mi. But they are unnecessary and complicated. The original goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer inside MODE_INFO_t, same goal could be achieved. This commit totally removes the mi_grid_* structures. But there are still many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit will do on-demand MODE_INFO_t allocation in order to save these memories. Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6	2014-09-19 21:27:11 -07:00
Deb Mukherjee	822b51609b	High bit-depth coefficient coding functions Tokenization and Detokenization enhancements for 10/12 bit Change-Id: I3c269ec30f8eb160ee024905638a193975237559	2014-09-19 15:21:24 -07:00
Frank Galligan	49dc7b05d0	Merge "FIX: vp9_loopfilter_intrin_sse2.c"	2014-09-18 15:10:16 -07:00
Scott LaVarnway	13284311eb	FIX: vp9_loopfilter_intrin_sse2.c Fixes Visual Studio build failures Change-Id: I233719cd63b3ad0db16e2834bf1d7ea1df805880	2014-09-18 13:09:13 -07:00
Deb Mukherjee	6d0ee9860e	Merge "Adds high bitdepth convolve, interpred & scaling"	2014-09-18 10:52:23 -07:00
Deb Mukherjee	0d3c3d3ce7	Adds high bitdepth convolve, interpred & scaling Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3	2014-09-18 07:26:17 -07:00
Frank Galligan	4e066299d9	Merge "Improved mb_lpf_horizontal_edge_w_sse2_16() #2 "	2014-09-17 18:52:30 -07:00
Scott LaVarnway	217e3cb1fb	Improved mb_lpf_horizontal_edge_w_sse2_16() #2 The decoder performance improved up to 1% for the test clips used. Change-Id: I4621112bdccfba01640322facfa4ba8da8290ea5	2014-09-17 17:25:20 -07:00
Deb Mukherjee	7d0e4f9ad1	Resolves a few gcc warnings clang is fine. Change-Id: Ia4e9ff17ea3b86bc87dca35828ee7ce45bea6994	2014-09-16 22:44:40 -07:00
Deb Mukherjee	f7cf05cfe0	Merge "Adding high-bitdepth intra prediction functions"	2014-09-16 17:10:24 -07:00
Frank Galligan	ecd7e3d2b7	Merge "Remove memset of every external frame buffer."	2014-09-16 15:17:26 -07:00
Deb Mukherjee	81a8138fc3	Adding high-bitdepth intra prediction functions Change-Id: I6f5cb101e2dc57c3d3f4d7e0ffb4ddbed027d111	2014-09-16 15:04:39 -07:00
Deb Mukherjee	5cd0aab81a	Adds high bitdepth quantization functions Adds various high bitdepth quantization functions. Change-Id: I36fc0bf75a1bd15128ed271df8723de0ac134b0c	2014-09-16 14:55:37 -07:00
Yaowu Xu	601f3a886e	Fix a performance regression This commit adds back sse2 or ssse3 optimized versio of a couple of functions, fixes a ~10% performance regression. Change-Id: I049786906e5a641224dced63c6492aec9d86d183	2014-09-16 11:18:46 -07:00
Frank Galligan	175d9dfe0a	Remove memset of every external frame buffer. Libvpx was memseting every external frame buffer before decode. This was to work around a valgrind issue in our C loop filter. Most of the time this was not needed and we have noticed some significant performance loss on some platforms. Now we require the application to zero out the buffers if it is using external frame buffers. Change-Id: I7330d00a315e65137ed30edd5f813e8929b76242	2014-09-15 15:37:36 -07:00
Alexander Voronov	29071a418e	Fix invalid memory access on 2x downscale. The issue was discovered on bitstream with 2x vertical downscale. For zero MVs, y_pad is set to 1 only when vertical convolution is required. The original code assumes that for y_step_q4 == 32 we don't perform vertical convolution. But vp9_setup_scale_factors_for_frame() sets convolve functions so that when x_step and y_step are both not equal to 16, convolve in both directions is performed. And convolve() unconditionally subtracts one stride from source pointer when calls convolve_horiz(). This leads to invalid memory access. Change-Id: I882dfa6081a58e172b5ffa55842bfcd6727f10bf	2014-09-15 17:50:20 +04:00
Jingning Han	82fad6f4b6	Merge "Add a note for enum values of MV_REFERENCE_FRAME"	2014-09-13 10:42:45 -07:00
Deb Mukherjee	10783d4f3a	Adds high bitdepth transform functions and tests Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8	2014-09-11 19:56:33 -07:00
Deb Mukherjee	1e4136d35d	Adds high bit depth sad and variance functions Moves high bit depth sad/var functions from highbitdepth branch to master. Change-Id: If03845d8ef9c9c494e13350e7a587c289306b94d	2014-09-11 17:30:44 -07:00
Johann	ac2f2e7855	Merge "Allow specifying opt dependencies"	2014-09-11 16:02:41 -07:00
Johann	8645a53039	Allow specifying opt dependencies If optimizations use more than one cpu feature, allow specifying them so that '--disable-X' still works https://code.google.com/p/webm/issues/detail?id=854 Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18	2014-09-11 13:43:48 -07:00
Jingning Han	3ef9786b7e	Add a note for enum values of MV_REFERENCE_FRAME Change-Id: Ifaf6738f26e86ded6eb6ea1465bad7a229612999	2014-09-11 10:55:42 -07:00
Jim Bankoski	0e66848081	Merge "LoopFilterWorkerData: remove misleading 'const'"	2014-09-10 06:33:51 -07:00
James Zern	2215d2f135	Merge changes If8887e1d,I36bfc9c8,I3d1e6c42 * changes: vp9_dthread: simplify loop_filter_row_worker signature simplify vp9_loop_filter_worker signature vp9_decodeframe: simplify tile_work_hook signature	2014-09-09 16:50:28 -07:00
Dmitry Kovalev	8e205a2a09	Merge "Cleaning up and speeding up vp9_idct32x32_1024_add_sse2()."	2014-09-09 12:50:23 -07:00
James Zern	7b572c9806	LoopFilterWorkerData: remove misleading 'const' 'frame_buffer' is modified indirectly via 'planes'. + do the same for vp9_loop_filter_rows Change-Id: Ibb7daa2e261064e4a5317a2969e3490e59891b82	2014-09-08 20:06:48 -07:00
James Zern	48662747bd	simplify vp9_loop_filter_worker signature use the type names directly in the function declaration rather than (void arg1, void arg2) Change-Id: I36bfc9c886310ce370bf0ca7c679ebd6e95109cc	2014-09-08 19:53:46 -07:00
Dmitry Kovalev	980abf6078	Fixing Mac OS build. Change-Id: Ifae8906185a868a07685eb7a7da2484af95e70a7	2014-09-08 08:53:12 -07:00
Dmitry Kovalev	70092af5c0	Cleaning up and speeding up vp9_idct32x32_1024_add_sse2(). Change-Id: If91017b792572c9db6e257011ca307bef8428486	2014-09-05 18:12:30 -07:00
Dmitry Kovalev	89963bf586	Merge "Removing postproc mmx code."	2014-09-05 18:11:08 -07:00
Dmitry Kovalev	54bec0971f	Merge "Initializing intra modes without vpx_once()."	2014-09-05 12:03:36 -07:00
Dmitry Kovalev	1100e262c5	Removing postproc mmx code. Removed functions: * vp9_post_proc_down_and_across_mmx * vp9_mbpost_proc_down_mmx * vp9_plane_add_noise_mmx They all have sse2 equivalent. Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff	2014-09-05 11:52:50 -07:00
James Zern	a8083449e9	fix x86-darwin* build vp9_variance_sse2.c contains a mix of intrinsics and references to assembly which uses x86inc.asm; it's conditionally included as a result. Change-Id: I254451483a65881c0b8e18e27bf0c3ddef60c4ec	2014-09-04 23:32:13 -07:00
Dmitry Kovalev	490943552f	Removing unused function prototypes. Change-Id: Ia5e383e2cf18052f6f1eacf8b9495ab8e4d58878	2014-09-04 14:26:30 -07:00
Dmitry Kovalev	48197f0a70	Adding sse2 variant for vp9_mse{8x8, 8x16, 16x8}. Change-Id: I6786d25ce4f32b8d8912f2d239a45ca15b310c4b	2014-09-03 19:02:14 -07:00
Dmitry Kovalev	bf778e7d8e	Initializing intra modes without vpx_once(). Change-Id: I0a9d52432f2500f1bd8f43f229e70e38bb9a0343	2014-09-03 11:39:02 -07:00
Dmitry Kovalev	0ecc75c819	Merge "Removing MMX SAD calculation code."	2014-09-02 17:35:59 -07:00
Dmitry Kovalev	318fc0c34f	Removing MMX SAD calculation code. Removed functions: * vp9_sad_16x16_mmx * vp9_sad_8x16_mmx * vp9_sad_16x8_mmx * vp9_sad_8x8_mmx * vp9_sad_4x4_mmx Change-Id: Ic5174b93b64d65d846f0c11e72cab149e9472bc3	2014-09-02 14:41:36 -07:00
Deb Mukherjee	5acfafb18e	Adds config opt for highbitdepth + misc. vpx Adds config parameter vp9_highbitdepth, to support highbitdepth profiles. Also includes most vpx level high bit-depth functions. However encode/decode in the highbitdepth profiles will not work until the rest of the code is in place. Change-Id: I34c53b253c38873611057a6cbc89a1361b8985a6	2014-09-02 14:37:10 -07:00
Dmitry Kovalev	12cd6f421d	Removing variance MMX code. Removed functions: * vp9_mse16x16_mmx * vp9_get_mb_ss_mmx * vp9_get4x4var_mmx * vp9_get8x8var_mmx * vp9_variance4x4_mmx * vp9_variance8x8_mmx * vp9_variance16x16_mmx * vp9_variance16x8_mmx * vp9_variance8x16_mmx They all have SSE2 equivalent. Change-Id: I3796f2477c4f59b35b4828f46a300c16e62a2615	2014-08-29 10:26:42 -07:00
Dmitry Kovalev	eba83a0fdb	Merge "Replacing int_mv with MV inside the first pass code."	2014-08-25 13:56:14 -07:00
Dmitry Kovalev	a459e582cb	Replacing int_mv with MV inside the first pass code. Change-Id: Ia3be6b5a18e1ff6cc5c5f4d37e4a5d0972388308	2014-08-22 16:20:18 -07:00
Jim Bankoski	cebe2c8d88	vp9_postproc.c: unused parameter warning resolved Change-Id: I6d77a7c775c0482fd1f9bb03ea6f336dd2973fa0	2014-08-22 13:41:07 -07:00
Yaowu Xu	23c88870ec	Merge "Fix bug 804"	2014-08-21 08:56:32 -07:00
Adrian Grange	c5d8c1e785	Merge "get_ref_frame: fix test for valid buffer."	2014-08-15 10:41:28 -07:00
Adrian Grange	54f8cb78c6	Merge "Fix bug 837: realloc mode info buffers on resize"	2014-08-14 14:53:33 -07:00
Adrian Grange	89a213b4b0	get_ref_frame: fix test for valid buffer. In the current implementation of the encoder, frame buffers may come from the wider set of 12 such buffers, and is not restricted to the 8 allowed as reference frames. This is only an implementation detail and does not affect the constraint of having a total of 8 reference buffers overall. Change-Id: I075f777146c2df49c275d89232933f8127235175	2014-08-14 12:42:11 -07:00
Adrian Grange	4e30565a9f	Fix bug 837: realloc mode info buffers on resize The test to determine if the mode info buffers need to be resized when the frame size changes was incorrect, as per bug 837. By storing the size of the allocated data structure, a simple test determines whether to allocate more memory when the frame size changes. Change-Id: I1544698f2882cf958fc672485614f2f46e9719bd	2014-08-14 08:59:15 -07:00
James Zern	4b79563805	Merge "get_ref_frame: check ref_frame_map value"	2014-08-12 22:48:27 -07:00
James Zern	a6b7bd6a1c	Merge "fixes several -Wunused-function warnings"	2014-08-12 20:15:14 -07:00
James Zern	3caed4f8fd	get_ref_frame: check ref_frame_map value 'ref_frame_map' is initialized to -1. avoids using an invalid index if VP9_GET_REFERENCE/VP8_COPY_REFERENCE controls are issued after a decode error. Change-Id: I4599762c4d0b07a5943a72bf4a86ccb596cc062a	2014-08-12 17:47:04 -07:00
Jim Bankoski	f452961765	fixes several -Wunused-function warnings Change-Id: I4dc2cb255f4fe30998b6ee61184895dee9f5da8e	2014-08-12 16:51:07 -07:00
Adrian Grange	1ebf52df2c	Common encode/decode function to get reference frame Replaced encoder and decoder functions to get a pointer to a reference frame with a common function, vp9_get_ref_frame, and simplified it. Change-Id: Icb206fcce8caace3bfd1db3dbfa318dde79043ee	2014-08-08 11:37:11 -07:00
Adrian Grange	75b42a4977	Remove coding_use_prev_mi member from VP9_COMMON This was shadowing the use of error_resilient_mode, but with the opposite sense. Change-Id: Ie4d30263a304fe4b3e94f0c7741db6888cc6afd8	2014-08-08 09:40:38 -07:00
levytamar82	69a5f5ecf7	Fix bug 807 in the sub_pixel_variance function the dst is aligned to 16 bytes and not to 32 bytes - now load unaligned data Change-Id: I2e0b9745543697efc56fefa32857ea10117af135	2014-08-07 18:51:02 -07:00
levytamar82	839911fb6d	Fix bug 804 A bug in Microsoft compiler was found in the function vp9_filter_block1d16_v8_avx2 and a workaround applied. the bug occur when there was 4 consecutive maddubs + min + adds intrinsic instructions. Change-Id: I83499faeb70971e650e5663fd2490360ddb1a51b	2014-08-07 15:09:24 -07:00
levytamar82	af10457e02	Fix bug 806 in the function sad32x32x4d and sad64x64x4d the source is aligned to 16 bytes and not to 32 bytes - the load is now unaligned. Change-Id: I922fdba56d0936b5cf72e4503519f185645a168c	2014-08-07 14:13:30 -07:00
Dmitry Kovalev	65234504b9	Merge "Removing direct references to VP9_COMP."	2014-08-07 14:12:32 -07:00
Deb Mukherjee	a468170804	Merge "Changes hdr for profiles > 1 for intraonly frames"	2014-08-07 11:15:38 -07:00
Deb Mukherjee	09bf1d61ca	Changes hdr for profiles > 1 for intraonly frames Specifies the bit-depth, color sampling and colorspace for intra only frames for profiles > 0 Also adds checks to ensure that profile 1 and 3 are exclusively used for non 420 streams. Change-Id: Icfb15fa1acccbce8f757c78fa8a2f60591360745	2014-08-07 09:47:14 -07:00
Yaowu Xu	0a2b25dcb9	configure: add --enable-coefficient-range-checking This commit adds a configure time option used to enable strict error checking in decoder to make sure intermediate stage cofficients of inverse transforms are within valid range of signed 16 bit integer. For valid VP9 input streams, intermediate stage coefficients should always stay within the range of a signed 16 bit integer. Coefficients can go out of this range for invalid/corrupt VP9 streams. However, strictly checking this range for every intermediate coefficient can be a burden for decoder, therefore such validation is only enabled with configure option --enable-coefficient-range-checking. Change-Id: I47d47c8c4e48a922c3d223ca59064f51b3f0f5ed	2014-08-06 17:13:16 -07:00
Dmitry Kovalev	09b3d04aac	Removing direct references to VP9_COMP. Change-Id: Ic37624d807884e71f08b50fd04892f03f2708ba7	2014-08-06 12:59:02 -07:00
Johann	7516abc7dc	Remove vp9_postproc_x86.h This configuration has moved to vp9_rtcd_defs.pl Change-Id: I71a31dbb8d79df226b60dd834324a5af69956c51	2014-08-05 15:46:13 -07:00
Jim Bankoski	128827d947	cast enums to int to avoid gcc warning in pred_common Change-Id: Ie3e478ef4fa565225d9e19a14d2f40aad966c2b6	2014-08-04 12:07:37 -07:00
Jim Bankoski	7f63dabfe9	break at the end of clauses with assert(0) to avoid gcc warning Change-Id: I1b3c5337f018dde27dc819ab18bd081d169a91e8	2014-08-04 08:52:53 -07:00
Jim Bankoski	3cf5908e24	uint8_t segment and skip to avoid signed / unsigned warnings Change-Id: I2e2765b851fb0a1b15351c2aa0e079197cbee373	2014-08-04 08:52:40 -07:00
James Zern	ce896df057	Merge "vp9_entropy: inline comes first to avoid warning."	2014-08-01 19:15:34 -07:00
James Zern	3a924f6ed1	Merge "signed unsigned mismatch - warning error"	2014-08-01 16:28:38 -07:00
Jim Bankoski	9c74e6aac7	vp9_entropy: inline comes first to avoid warning. Change-Id: I5b050122e6ed183a5b33c1f38e4fbf63b6721062	2014-08-01 16:05:30 -07:00
James Zern	1b6ac28a2f	Merge "removed sign mismatch warning"	2014-08-01 14:45:12 -07:00
Frank Galligan	5f8fa13258	Merge "Added vp9_sad8x8_neon()"	2014-08-01 14:11:38 -07:00
Scott LaVarnway	98165ec074	Neon version of vp9_sub_pixel_variance8x8(), vp9_variance8x8(), and vp9_get8x8var(). On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~1.2%. Change-Id: I8a66ac2a0f550b407caa27816833bdc563395102	2014-08-01 11:35:55 -07:00
Frank Galligan	5487b6067c	Merge "Neon version of vp9_sub_pixel_variance32x32(),"	2014-08-01 09:46:37 -07:00
Scott LaVarnway	545be78136	Added vp9_sad8x8_neon() Change-Id: I3be8911121ef9a5f39f6c1a2e28f9e00972e0624	2014-08-01 06:36:18 -07:00
Jim Bankoski	0f3689d32d	signed unsigned mismatch - warning error Change-Id: I991e36aa3cfa62aae6d27b253297dd9ca9e8bc12	2014-08-01 06:29:32 -07:00
Jim Bankoski	512f9b631f	removed sign mismatch warning Change-Id: Iaa40b472f6c1c48bb3bb47332b6fcf36d7f3c10e	2014-08-01 06:28:00 -07:00
Scott LaVarnway	6f4b8dcdc2	Neon version of vp9_subtract_block() On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~3.2% Change-Id: I8862497264142171b7efc32df1a67714a23539f4	2014-07-31 09:28:06 -07:00
Scott LaVarnway	d39448e2d4	Neon version of vp9_sub_pixel_variance32x32(), vp9_variance32x32(), and vp9_get32x32var(). Change-Id: I8137e2540e50984744da59ae3a41e94f8af4a548	2014-07-31 08:00:36 -07:00
Scott LaVarnway	d4a37db5b8	Neon version of vp9_quantize_fp() On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~12.4% Change-Id: Id29d215acf58bb108489e218a259adf74b4768d7	2014-07-30 09:33:46 -07:00
Scott LaVarnway	521cf7e879	Neon version of vp9_sub_pixel_variance16x16(), vp9_variance16x16(), and vp9_get16x16var(). On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~16.7%. Change-Id: Ib163aa99f56e680194aabe00dacdd7f0899a4ecb	2014-07-30 08:17:32 -07:00
Scott LaVarnway	d19d222db6	Added vp9_fdct8x8_neon(), vp9_fdct8x8_1_neon() On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~3.7%. Change-Id: I428c72c40df82c6d537955e320a8debf99343004	2014-07-29 08:56:05 -07:00
levytamar82	4ba92dc5ab	Fix bug 805 Remove all the redundant dct functions (dct4x4, dct8x8) in avx2 except dct32x32 those functions were copied originally from dct_sse2 Change-Id: I742576fbf5175f3ac09f2076976a9247b259323e	2014-07-28 15:46:01 -07:00
hkuang	44395a21da	Move vp9_dec_build_inter_predictors_* to decoder folder. Change-Id: Ibe9fa28440cc79ba9f3504d78c7dca7bb01a23e1	2014-07-28 11:09:11 -07:00
hkuang	7eca086707	Add segmentation map array for current and last frame segmentation. The original implementation only allocates one segmentation map and this works fine for serial decode. But for frame parallel decode, each thread need to have its own segmentation map and the last frame segmentation map should be provided from last frame decoding thread. After finishing decoding a frame, thread need to serve the old segmentation map that associate with the previous decoded frame. The thread also need to use another segmentation map for decoding the current frame. Change-Id: I442ddff36b5de9cb8a7eb59e225744c78f4492d8	2014-07-28 10:44:02 -07:00
Jingning Han	53844275e9	Fix potential ioc issue in vp9_get_prob for 4K above sizes This commit turns on the existing vp9_get_prob function using 64 bit in the intermediate step. It fixes the ioc issue for 4K above frame sizes (issue 828). Change-Id: I9f627f3beca2c522f73b38fd2a3e7eefdff01a7c	2014-07-24 15:35:51 -07:00
Alex Converse	5926e7c0e8	Remove unfinished VP9 alpha channel. Change-Id: Ic5d3a3a0dac10b49495771886a31e793bb78b5ca	2014-07-21 15:55:50 -07:00
Deb Mukherjee	727f384085	Merge "Separates profile 2 into 2 profiles 2 and 3"	2014-07-18 03:23:51 -07:00
Deb Mukherjee	c447a50aea	Separates profile 2 into 2 profiles 2 and 3 Separates HBD profile int two profiles (2 and 3) consistent with the highbitdepth branch. This patch is ported from the original highbitdepth branch patch: https://gerrit.chromium.org/gerrit/#/c/70460/ Two of the invalid file tests needed to be updated. Change-Id: I6a4acd2f7a60b1fb4cbcc8e0dad4eab4248431e3	2014-07-17 20:51:59 -07:00
Adrian Grange	8cb8aef7c7	Merge "Modified frame buffer handling"	2014-07-17 12:15:16 -07:00
Scott LaVarnway	ba0652e83a	Merge "Added vp9_sad64x64_neon(), vp9_sad32x32_neon()"	2014-07-17 11:42:16 -07:00
Adrian Grange	f68aaa38d6	Modified frame buffer handling This patch is the first step toward simplifying the frame buffer handling. The final goal is to have a common frame buffer handling framework for both encoder and decoder that incorporates the existing ability to use externally allocated memory. Change-Id: I2c378a4f54a39908915f46c4260e17a080db7ff1	2014-07-17 11:06:35 -07:00
Scott LaVarnway	696fa52eaa	Added vp9_sad64x64_neon(), vp9_sad32x32_neon() and vp9_sad16x16_neon() On a Nexus 7, vpxenc (in realtime mode, speed -6) reported a performance improvement of ~17%. Change-Id: I91e070cde2973451083d3f3d63b49b7886de9a85	2014-07-16 12:54:46 -07:00
Deb Mukherjee	1f6aaeddc5	Merge "Some extra bit probability cleanups"	2014-07-14 17:26:54 -07:00
hkuang	4c08120ca0	Merge "Include the right header for VP9 worker thread." into frame_parallel	2014-07-14 16:09:16 -07:00
hkuang	294b849796	Include the right header for VP9 worker thread. pthread.h is not supported in windows. vp9_thread.h includes the emulation layer for pthread in windows. Change-Id: I2b1c8ec299928472faca7ebeea998170c9f4d744	2014-07-14 16:03:38 -07:00
Jingning Han	6ce515b9ff	Merge "Fix chrome valgrind warning due to the use of mismatched bsize"	2014-07-13 11:07:44 -07:00
James Zern	0999a2a24e	Merge "vp9_loopfilter.c: cosmetics"	2014-07-11 16:02:21 -07:00
Jingning Han	3cddd81c6d	Fix chrome valgrind warning due to the use of mismatched bsize This commit fixes a mismatched use case of block size in non-RD intra prediction check. The residual SSE and variance should be calculated per transform block size, instead of operating block size, which caused chrome valgrind warning on conditional jump based on uninitialized value (webm issue 823). This commit resolves this issue. Change-Id: I595c06599c7e0fd0e4a08736519ba68fc14bc79a	2014-07-11 15:49:22 -07:00
hkuang	3cffa0c74e	Move vp9_thread.* to common. Prepare for frame parallel decoding, the reference count buffers need to be protected by mutex. Move vp9_thread.* to common folder so that those buffers could use cross-platform mutex from vp9_thread.*. (cherry picked from commit `337e8015c9`) Change-Id: I0587a08447925f4554d7788686a31483c2ae3f37	2014-07-11 15:24:31 -07:00
Yunqing Wang	7e340614c1	Merge "Remove unnecessary assertions"	2014-07-11 13:47:03 -07:00
Deb Mukherjee	6957e7a077	Some extra bit probability cleanups Refactoring to remove some duplication of probability tables between tokenization and detokenization. Change-Id: I2fc6a6497f9c0410021a9b41f828bc58a864e466	2014-07-11 11:39:18 -07:00
Yunqing Wang	978642a426	Remove unnecessary assertions Removed 2 unnecessary assertions. Change-Id: I0f8877d0494bf3ecdb0d7931ccbcaa8289e01d8b	2014-07-11 10:48:57 -07:00
Yaowu Xu	a75d55df1b	Remove an unused parameter Change-Id: I6ad6fd75dc3c9e6218d88148cf49e205398e2af5	2014-07-11 08:10:04 -07:00
James Zern	8a7cc1f47b	Merge "update vp9_thread.c"	2014-07-10 23:19:55 -07:00
James Zern	8701ed0270	update vp9_thread.c pull the latest from libwebp. Original source: http://git.chromium.org/webm/libwebp.git 100644 blob 264210ba2807e4da47eb5d18c04cf869d89b9784 src/utils/thread.c commit 46fd44c1042c9903b2f1ab87e9f200a13c7e702d Author: James Zern <jzern@google.com> Date: Tue Jul 8 19:53:28 2014 -0700 thread: remove harmless race on status_ in End() if a thread was still doing work when End() was called there'd be a race on worker->status_. in these cases, however, the specific value is meaningless as it would be >= OK and the thread would have been shut down properly, but we'll check 'impl_' instead to avoid any potential TSan/DRD reports. Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1 Change-Id: Ib0ef25737b3c6d017fa74822e21ed58508230b91	2014-07-10 12:20:54 -07:00
Yunqing Wang	1226d133df	Merge "Refactor vp9_diamond_search_sad function"	2014-07-10 11:06:32 -07:00
Yunqing Wang	46441ec5c8	Merge "Refactor refining_search_sad code"	2014-07-10 10:43:00 -07:00
hkuang	51e9788e58	Fix a bug in boundary checking. Change-Id: Ifc741da9da6f61c8d3c1f675ec6b8a96570f877d	2014-07-10 09:43:04 -07:00
Yunqing Wang	75cd57503d	Refactor vp9_diamond_search_sad function Currently, vp9_diamond_search_sadx4() is only called when sse3 is enabled, which is improper since sse2 optimization of sdx4df functions are available. Changed to always use vp9_diamond_search_sadx4(). Change-Id: I4b95d6b7a3c6c645783c373f0ba8d645ece24717	2014-07-10 09:19:03 -07:00
James Zern	58609335b1	vp9_loopfilter.c: cosmetics - fix indent, spelling - drop some whitespace in some comments - add an assert in vp9_setup_mask, it shouldn't be called on decode error Change-Id: Ic312a815e977a6f9cb81ceb7b039eeada76c5aa0	2014-07-09 17:27:57 -07:00
Yunqing Wang	30117a576d	Refactor refining_search_sad code There are sse2 optimization of sdx4df functions. Instead of calling vp9_refining_search_sadx4 only when sse3 is enabled, call it always. Change-Id: I24f93818f7d4209d1425039e0eb099ff9ff08fe9	2014-07-09 16:50:11 -07:00
Jingning Han	f6bf614b2f	Merge "Re-design quantization process for 32x32 transform block"	2014-07-09 11:55:26 -07:00
hkuang	b84ee5a3d0	Merge "Move vp9_thread.* to common."	2014-07-09 10:16:13 -07:00
Jingning Han	9ad1b9fc67	Re-design quantization process for 32x32 transform block This commit enables a new quantization process for 32x32 2D-DCT transform coefficient blocks. It improves the compression performance of speed 5 by 1.4%. The overall compression gains of speed 5 due to the new quantization scheme is 4.7%. It also includes the SSSE3 implementation of the 32x32 quantization process. Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b	2014-07-08 16:55:28 -07:00
Adrian Grange	7c43fb67ae	Fix decoder handling of intra-only frames This patch fixes bug 633: https://code.google.com/p/webm/issues/detail?id=633 The first decoded frame does not have to be a keyframe, it could be an inter-frame that is coded intra-only. This patch fixes the handling of intra-only frames. A test vector has also been added that encodes 3 intra-only frames at the start of the clip. The test vector was generated using the code in the following patch: https://gerrit.chromium.org/gerrit/#/c/70680/ Change-Id: Ib40b1dbf91aae2bc047e23c626eaef09d1860147	2014-07-08 16:24:03 -07:00
hkuang	337e8015c9	Move vp9_thread.* to common. Prepare for frame parallel decoding, the reference count buffers need to be protected by mutex. Move vp9_thread.* to common folder so that those buffers could use cross-platform mutex from vp9_thread.*. Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154	2014-07-07 14:52:19 -07:00
hkuang	28a794f680	Seperate the frame buffers from VP9 encoder/decoder structure. Prepare for frame parallel decoding, the frame buffers must be separated from the encoder and decoder structure, while the encoder and decoder will hold the pointer of the BufferPool. Change-Id: I172c78f876e41fb5aea11be5f632adadf2a6f466	2014-07-02 15:34:20 -07:00
Yaowu Xu	82fd084b35	Merge "Re-design quantization process"	2014-07-01 19:04:01 -07:00
Jingning Han	9ac2f66320	Re-design quantization process This commit re-designs the quantization process for transform coefficient blocks of size 4x4 to 16x16. It improves compression performance for speed 7 by 3.85%. The SSSE3 version for the new quantization process is included. The average runtime of the 8x8 block quantization is reduced from 285 cycles -> 255 cycles, i.e., over 10% faster. Change-Id: I61278aa02efc70599b962d3314671db5b0446a50	2014-07-01 17:00:07 -07:00
Alex Converse	6c54dbcb69	Merge "BITSTREAM: Handle transform size and motion vectors more logically for non-420."	2014-06-30 17:44:01 -07:00
James Zern	44472cde55	vp9: disable postproc buffer alloc when unnecessary the buffer is only used in encoding and only when CONFIG_INTERNAL_STATS or CONFIG_VP9_POSTPROC is enabled. a future change should decouple this from the frame buffer allocation and make it conditional based on runtime flags when the above config options are enabled. reduces decode heap usage by at least 12% Change-Id: Id0b97620d4936afefa538d3aadf32106743d9caf	2014-06-27 20:59:56 -07:00
Jim Bankoski	52b63c238e	Merge "Better validation of invalid files"	2014-06-27 11:05:21 -07:00
Jim Bankoski	9f37d149c1	Better validation of invalid files This patch checks that a decoder never tries to reference frame that's outside the range of 2x to 1/16th the size of this frame. Any attempt to do so causes a failure. Change-Id: I5c98fa7bb95ac4f29146f29dd92b62fe96164e4c	2014-06-27 10:03:15 -07:00
Jingning Han	46ea9ec719	Enable real-time version reference motion vector search This commit enables a fast reference motion vector search scheme. It checks the nearest top and left neighboring blocks to decide the most probable predicted motion vector. If it finds the two have the same motion vectors, it then skip finding exterior range for the second most probable motion vector, and correspondingly skips the check for NEARMV. The runtime of speed -5 goes down pedestrian at 1080p 29377 ms -> 27783 ms vidyo at 720p 11830 ms -> 10990 ms i.e., 6%-8% speed-up. For rtc set, the compression performance goes down by about -1.3% for both speed -5 and -6. Change-Id: I2a7794fa99734f739f8b30519ad4dfd511ab91a5	2014-06-26 09:49:13 -07:00
Adrian Grange	8357292a5a	Fix test on maximum downscaling limits There is a normative scaling range of (x1/2, x16) for VP9. This patch fixes the maximum downscaling tests that are applied in the convolve function. The code used a maximum downscaling limit of x1/5 for historic reasons related to the scalable coding work. Since the downsampling in this application is non-normative it will revert to using a separate non-normative scaler. Change-Id: Ide80ed712cee82fe5cb3c55076ac428295a6019f	2014-06-24 10:26:09 -07:00
Adrian Grange	8c1f071f1e	Allocate buffers based on correct chroma format The encoder currently allocates frame buffers before it establishes what the chroma sub-sampling factor is, always allocating based on the 4:4:4 format. This patch detects the chroma format as early as possible allowing the encoder to allocate buffers of the correct size. Future patches will change the encoder to allocate frame buffers on demand to further reduce the memory profile of the encoder and rationalize the buffer management in the encoder and decoder. Change-Id: Ifd41dd96e67d0011719ba40fada0bae74f3a0d57	2014-06-23 11:45:13 -07:00
Jingning Han	961bafc366	Merge "Remove unused vp9_init_quant_tables function"	2014-06-23 09:37:30 -07:00
Johann	1fc2b0fd00	Merge "Include type defines"	2014-06-20 11:29:19 -07:00
Johann	d658216276	Don't return value for void functions Clears "warning: 'return' with a value, in function returning void" Change-Id: I93972610d67e243ec772a1021d2fdfcfc689c8c2	2014-06-20 11:26:44 -07:00
Johann	baef0b89da	Include type defines Clears error: unknown type name 'uint8_t' Change-Id: I9b6eff66a5c69bc24aeaeb5ade29255a164ef0e2	2014-06-20 11:26:13 -07:00
Alex Converse	7557a65d16	BITSTREAM: Handle transform size and motion vectors more logically for non-420. This breaks the profile 1 bitstream. Don't force non420 uv transform size to 1/4 y size. In the 4:2:0 case the chroma corresponding to a luma block is 1/4 its size. In the 4:4:4 case chroma and luma planes are the same size. Disallowing larger transforms can result in a loss of compression efficiency and is inconsistent. For sub-8x8 blocks only average corresponding motion vectors. 4:2:0 and profile 0 behavior remains unchanged. Change-Id: I560ae07183012c6734dd1860ea54ed6f62f3cae8	2014-06-18 13:07:51 -07:00
Jingning Han	3b9c19aaa7	Remove unused vp9_init_quant_tables function This function is not effectively used, hence removed. Change-Id: I2e8e48fa07c7518931690f3b04bae920cb360e49	2014-06-18 11:51:41 -07:00
James Zern	88df435d6b	Merge "vp9_rtcd: correct avx2 references"	2014-06-16 17:39:13 -07:00
Johann	79afb5eb41	Use lrand48 on Android When building x86 assembly use lrand48 instead of the undocumented inlined _rand function. Android now supports rand() https://android-review.googlesource.com/97731 but only for new versions. Original workaround: https://gerrit.chromium.org/gerrit/15744 Change-Id: I130566837d5bfc9e54187ebe9807350d1a7dab2a	2014-06-12 19:57:25 -07:00
Jingning Han	d5ae43318e	Merge "Fast computation path for forward transform and quantization"	2014-06-12 11:59:52 -07:00
Jingning Han	ccba289f8d	Fast computation path for forward transform and quantization This commit enables a fast path computational flow for forward transformation. It checks the sse and variance of prediction residuals and decides if the quantized coefficients are all zero, dc only, or more. It then selects the corresponding coding path in the forward transformation and quantization stage. It is currently enabled in rtc coding mode. Will do it for rd coding mode next. In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up. Overall coding performance for rtc set is changed by -0.18%. Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1	2014-06-12 11:10:54 -07:00
James Zern	9f3a0dbb5e	vp9_rtcd: correct avx2 references s/"\$avx2_x86inc"/"avx2"/ avx2 code is all intrinsics and as a result doesn't rely on x86inc.asm Change-Id: I76ad39474d8a00658f3e43131830ef0f4f34772a	2014-06-10 16:26:36 -07:00
James Zern	cbce09ce62	Merge changes I6abc0657,I8224fba2,I04f64a45,I5d49d119,I76b4d171,I88c11ac3 * changes: vp9_sub_pixel_variance: disable avx2 variants vp9_sad*x4d: disable avx2 variants vp9_f(dct\|ht): disable avx2 variants convolve: disable avx2 variants fdct8x8_test: add missing avx2 functions dct4x4_test: add missing avx2 functions	2014-06-10 16:14:45 -07:00
James Zern	520cb3f39f	vp9_sub_pixel_variance: disable avx2 variants tests failing under Win32/Win64 + variance_test: add missing avx2 functions (partially disabled) Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d	2014-06-10 16:11:15 -07:00
James Zern	d3ff009d84	vp9_sad*x4d: disable avx2 variants tests failing under Win32/Win64 + sad_test: add missing avx2 functions (disabled) Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856	2014-06-10 16:10:12 -07:00
hkuang	cdffeaaae0	Add mode info arrays and mode info index. In non frame-parallel decoding, this works the same way as current decoding scheme. Every time after decoder finish decoding a frame, it will swap the current mode info pointer and previous mode info pointer if the decoded frame needs to be shown. Both mode info pointer and previous mode info pointer are from mode info arrays. In frame-parallel decoding, this will become more complicated as current frame's mode info pointer will be shared with next frame as previous mode info pointer. But when one decoder thread finishes decoding one frame and starts to work on next available frame, it needs to retain the decoded frame's mode info pointers until next frame finishes decoding. The mode info index will serve this purpose. The decoder will use different buffer in the mode info arrays and use the other buffer to save previous decoded frame’s mode info. Change-Id: If11d57d8eb0ee38c8876158e5482177fcb229428	2014-06-10 13:43:36 -07:00
James Zern	dd9f502933	vp9_f(dct\|ht): disable avx2 variants tests failing under Win32/Win64 + dct16x16_test: add missing avx2 functions (partially disabled) exercises the forward transforms no idct/iht implementations, so the c-code is used Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61	2014-06-09 18:48:11 -07:00
James Zern	5704578f5f	convolve: disable avx2 variants tests failing under Win32/Win64 Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf	2014-06-09 18:42:03 -07:00
Jingning Han	0c4a4225ec	Merge "Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs"	2014-06-03 16:51:39 -07:00
Dmitry Kovalev	19c492a749	Merge "Reusing existing vp9_get{8x8, 16x16}var() instead of new ones."	2014-06-03 10:04:27 -07:00
Deb Mukherjee	fc88292ef2	Remove Wextra warnings from vp9_sad.c As a side-effect, the sad unit tests for VP8 and VP9 had to be separated. Fixes a bug in original patch: (https://gerrit.chromium.org/gerrit/#/c/70163/8) that was reverted due to a nightly test failure. Change-Id: Ia2a4e9e278fd3c89d6c3c82fcc6381320ec2a8a6	2014-06-02 13:50:20 -07:00
Frank Galligan	c40a968e13	Merge "Revert "Remove Wextra warnings from vp9_sad.c""	2014-06-01 16:58:11 -07:00
Frank Galligan	0b44988952	Revert "Remove Wextra warnings from vp9_sad.c" This reverts commit `916550428d` Change-Id: I500822b03f09c64ff6ec5396c68edee9ca3b75cb	2014-06-01 16:20:26 -07:00
Jingning Han	ba6bed372b	Merge "Fix a potential overflow issue in inverse 16x16 full 2D-DCT"	2014-05-30 15:52:53 -07:00
Jingning Han	2c1cdf69b6	Fix a potential overflow issue in inverse 16x16 full 2D-DCT An overflow issue could potentially happen in the second round 1-D transform of the SSSE3 full inverse 16x16 2D-DCT. This commit fixes this issue. Change-Id: Ia19e4888fda1cc929a28a5f89a5beec612d628dc	2014-05-29 11:46:32 -07:00
Dmitry Kovalev	e14f900ae3	Merge "Moving itxm_add pointer from MACROBLOCKD to MACROBLOCK."	2014-05-29 11:16:39 -07:00
Dmitry Kovalev	f7ff24cdd0	Reusing existing vp9_get{8x8, 16x16}var() instead of new ones. Change-Id: I87b7c657d8813d7fb383ab519d150c0ffb1dd377	2014-05-29 11:14:06 -07:00
Jingning Han	6d21cbd20b	Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs This commit enables SSSE3 implementation of the inverse 2D-DCT with only first 10 coefficients non-zero. It reduces the runtime of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up. Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe	2014-05-28 10:53:33 -07:00
Jingning Han	d5bcef5242	Merge "Fix compiling error in MSVS"	2014-05-27 16:58:00 -07:00
Jingning Han	239e68ddbf	Fix compiling error in MSVS Need to include math.h before tmmintrin.h in some versions of MSVS. Change-Id: Ia6b83ae599316887ecf30c4e4b9e4355fb8a4219	2014-05-27 15:58:47 -07:00
Yunqing Wang	1f2200080b	Revert "Making vp9_get_sse_sum_{8x8, 16x16} static." This reverts commit `e8bbb3d9db`. Change-Id: Ie368d36fd249d323d859d208609c711f04537bbc	2014-05-27 13:37:08 -07:00
Deb Mukherjee	444f93945b	Merge "Remove Wextra warnings from vp9_sad.c"	2014-05-27 11:54:05 -07:00
Yunqing Wang	a591ac9e5a	Merge "Fix decoder mismatch in sub-pixel AVX2 intrinsic filters"	2014-05-27 10:52:16 -07:00
levytamar82	773596050f	Fix decoder mismatch in sub-pixel AVX2 intrinsic filters The subpixel SSSE3 was fixed in this patch: https://gerrit.chromium.org/gerrit/#/c/70283/ So the equivalent AVX2 is fixed accordingly. Change-Id: Ieebbc1949c99d34b12b8b47692df71aca5001f3a	2014-05-23 16:48:40 -07:00
Jingning Han	59c3f446fe	Merge "Inverse 16x16 2D-DCT SSSE3 implementation"	2014-05-23 16:01:22 -07:00
Jingning Han	48b0891370	Inverse 16x16 2D-DCT SSSE3 implementation This commit enables the SSSE3 implementation of full inverse 16x16 2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles, about 7% speed-up. Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d	2014-05-23 15:09:35 -07:00
Yunqing Wang	67ca5b586a	Merge "Fix decoder mismatch in sub-pixel SSSE3 intrinsic filters"	2014-05-23 14:24:48 -07:00
Dmitry Kovalev	d7d7cedaaa	Merge "Removing vp9_pragmas.h."	2014-05-23 12:58:00 -07:00
Yunqing Wang	c5443fc881	Fix decoder mismatch in sub-pixel SSSE3 intrinsic filters In 8-tap filtering, to guarantee the intermediate results fit in 16 bits, the order of accumulating the products needs to be done correctly, and the largest product should be added last. This patch fixed the problem using the method in commit "Correct ssse3 8/16-pixel wide sub-pixel filter calculation". Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423	2014-05-23 11:52:20 -07:00
Yaowu Xu	9410330893	Merge "change to use assembly version of ssse3 filter code"	2014-05-23 08:02:28 -07:00
Deb Mukherjee	916550428d	Remove Wextra warnings from vp9_sad.c As a side-effect, the sad unit tests for VP8 and VP9 had to be separated. Change-Id: I068cc2391eed51e9b140ea6aba78338c5fec8d71	2014-05-22 22:21:16 -07:00
Yaowu Xu	7a0c9b82f2	change to use assembly version of ssse3 filter code As mismatchs were found between the intrinsic version and c only. The commit temporarily revert to use the matching assembly version to allow further investigation. Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df	2014-05-22 17:11:57 -07:00
Yunqing Wang	aaf204e550	Merge "Fix a decoding mismatch in sub-pixel filters"	2014-05-22 17:09:14 -07:00
Yunqing Wang	efcdf946ed	Fix a decoding mismatch in sub-pixel filters This did the same correction as the one in commit "Correct ssse3 8/16-pixel wide sub-pixel filter calculation" to avoid saturation during filtering. Change-Id: Ife9aa3f62daf9114eb24fe38f7baa3c3f361b2d6	2014-05-22 15:42:13 -07:00
Dmitry Kovalev	72ab966d5e	Removing vp9_pragmas.h. Change-Id: I9120a87e27e73e496932d11716937e2fad246521	2014-05-22 13:46:31 -07:00
Deb Mukherjee	e272273443	Renames x86_64 specific asm files Renames all x86_64 specific assembly files to consistently end in _x86_64.asm. This will be useful for build systems to handle these files differently. All new 64-bit specific assembly files should use the new naming convention. Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536	2014-05-21 13:55:56 -07:00
Dmitry Kovalev	35a83677a5	Moving itxm_add pointer from MACROBLOCKD to MACROBLOCK. The final goal is eventually to get rid of both itxm_add and fwd_txm4x4. This patch does it in the decoder. Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5	2014-05-21 11:09:44 -07:00
Deb Mukherjee	ef750d8472	Merge "Extends temporal filtering to work for 422 data"	2014-05-20 16:31:28 -07:00
Deb Mukherjee	a185bc3350	Extends temporal filtering to work for 422 data This is needed for profiles 1 and 2. Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028	2014-05-20 15:19:40 -07:00
hkuang	20c1edf612	Refactor decode_tiles and loopfilter code. The current decode_tiles decodes the frame one tile by one tile and then loopfilter the whole frame or use another worker thread to do loopfiltering. \|------\|------\|------\|------\| \|Tile1-\|Tile2-\|Tile3-\|Tile4-\| \|------\|------\|------\|------\| For example, if a tile video has one row and four cols, decode_tiles will decode the Tile1, then Tile2, then Tile3, then Tile4. And during decode each tile, decode_tile will decode row by row in each tile. For frame parallel decoding, decode_tiles will decode video in row order across the tiles. So the order will be: "Decode 1st row of Tile1" -> "Decode 1st row of Tile2" -> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4" -> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2" -> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row" Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b	2014-05-20 14:47:45 -07:00
Dmitry Kovalev	c23c613fdf	Merge "Hiding vp9_sub_pel_filters_{8, 8s, 8lp} filters in *.c file."	2014-05-19 10:27:16 -07:00
Dmitry Kovalev	79ba41903f	Removing MACROBLOCKD dependency from loop filter. Change-Id: I9ef40f3d95ab8f94f69e92ea25678a40956bc1ce	2014-05-16 09:48:26 -07:00
Adrian Grange	9dc9f17814	Merge "Fix post-processor macros & remove vizualization"	2014-05-16 09:01:41 -07:00
Dmitry Kovalev	619e6b539a	Merge "Removing redundant "8x8" suffix from MODE_INFO vars."	2014-05-15 17:53:31 -07:00
Jim Bankoski	ec82d2dfec	Merge "Revert "Remove Wextra warnings from vp9_sad.c""	2014-05-15 11:54:23 -07:00
Yunqing Wang	c661cf0dad	Merge "AVX2 To VP9 Block Error Optimization"	2014-05-15 11:29:29 -07:00
Dmitry Kovalev	ed784a0bc4	Removing redundant "8x8" suffix from MODE_INFO vars. Change-Id: I7ed7fecc959c6598ff98895f1a5cf7e11ac1615f	2014-05-15 11:14:42 -07:00
Adrian Grange	384bc5163c	Fix post-processor macros & remove vizualization Make all post-processor code conditionally compilable based on the CONFIG_VP9_POSTPROC macro. Also, remove the vizualization code from VP9 since it is out of date and will not compile. Change-Id: I1e9e13a09ecd43e9a3f3704c175ae8cd258ababd	2014-05-15 08:35:36 -07:00
Jim Bankoski	a16794dd31	Revert "Remove Wextra warnings from vp9_sad.c" This reverts commit `7ab9a9587b` Nightly test http://build.webmproject.org/jenkins/view/libvpx-nightly-tests/job/libvpx%20unit%20tests%20(valgrind-2)/arch=x86_64-linux-gcc,filter=-VP8:Large./276/console Failed This patch did not address all the assembly issues some of the vp8 assembly counts on 5 arguments being passed in to this function: one example : vp8_sad8x16_wmt Please address or split this into vp9 and vp8 patches. Change-Id: I78afcc171649894f887bb8ee3c66de24aaddc7ca	2014-05-15 08:31:20 -07:00
Yaowu Xu	71854f3a6e	Merge "vp9_decodeframe.c: cleanup -wextra warnings"	2014-05-15 06:50:51 -07:00
Dmitry Kovalev	021eaabdb8	Hiding vp9_sub_pel_filters_{8, 8s, 8lp} filters in *.c file. Change-Id: Id401da740b0a0141caaef9e1bcccd981e5cef4a4	2014-05-14 16:21:41 -07:00
levytamar82	1fbab853c8	AVX2 To VP9 Block Error Optimization vp9_block_error_sse2 can only handle 16 bytes at a time but the function requires to handle a sequence of 32 bytes at a time so each 16 bytes is handled in a different register. With AVX2 optimization the 32 bytes can be handled in one register instead of two in the SSE2 The vp9_block_error was optimized by 85%. The user level was optimized by 1.2% Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd	2014-05-14 11:51:07 -07:00
Deb Mukherjee	9687c057f8	Merge "Remove Wextra warnings from vp9_sad.c"	2014-05-14 10:01:50 -07:00
Yaowu Xu	ed09580777	vp9_decodeframe.c: cleanup -wextra warnings Change-Id: I0315cea6a5e58182bc2556e9825ec2ef0b1480c3	2014-05-14 09:46:11 -07:00
Jingning Han	e5bbb4cfd8	Merge "Silience -wextra warnings in vp9_reconintra.c"	2014-05-14 09:25:08 -07:00
Deb Mukherjee	7ab9a9587b	Remove Wextra warnings from vp9_sad.c As a side-effect, the max_sad check is removed from the C-implementation of VP8, for consistency with VP9, and to ensure that the SAD tests common to VP8/VP9 pass. That will make the VP8 C implementation of sad a little slower but given that is rarely used in practice, the impact will be minimal. Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca	2014-05-14 03:17:31 -07:00
Dmitry Kovalev	eecc750b33	Merge "Moving loopfilter call to vp9_decode_frame()."	2014-05-13 17:20:26 -07:00
Jingning Han	806fa6aaca	Silience -wextra warnings in vp9_reconintra.c The warning messages complained that there are unused arguments in a few prediction modes. This structure was designed on purpose, such that a wrapper function can cover all prediction mode cases and make them readily accessible as an pointer array. This commit silences such warnings. Change-Id: I7036b6bdb70747e5327d8f6fceb154f100abc4c0	2014-05-13 12:54:23 -07:00
Adrian Grange	fd6bf31b8a	vp9_convolve.c: cleanup -wextra warnings Change-Id: I04930aca2293ebbaeb96dfedd2f9c5a55762fd2e	2014-05-13 09:57:24 -07:00
Dmitry Kovalev	ae7d3ef39f	Moving loopfilter call to vp9_decode_frame(). Inline loopfilter has been already handled in vp9_decode_frame(). Collecting all similar code in one place now. Change-Id: I358a0280fc7c2b27cca520bc1e8c16c4eb6491dd	2014-05-12 16:19:19 -07:00
Johann	ce23931a3f	Only build neon assembly for armv7 targets Allow selectively building just the intrinsics for armv8 Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477	2014-05-12 08:52:02 -07:00
Alex Converse	ec8a3272fa	Merge "Add an x86inc MMX fwht4x4."	2014-05-09 13:48:49 -07:00
Jingning Han	9412785b02	Merge changes I3edd4b95,I4514f974,Ie7fa4386 * changes: Turn on unit tests for SSSE3 8x8 forward and inverse 2D-DCT Change eob threshold for partial inverse 8x8 2D-DCT to 12 SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero	2014-05-09 09:58:39 -07:00
Alex Converse	b5422fab46	Add an x86inc MMX fwht4x4. Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197	2014-05-08 12:01:27 -07:00
Jingning Han	41a350a83d	Change eob threshold for partial inverse 8x8 2D-DCT to 12 The scanning order has the first 12 coefficients of the 8x8 2D-DCT sitting in the top left 4x4 block. Hence the partial inverse 8x8 2D-DCT allows to handle cases with eob below 12. The overall runtime of the inverse 8x8 2D-DCT unit is reduced from 166 cycles (using SSE2) to 150 cycles (using SSSE3). Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2	2014-05-08 09:48:58 -07:00
Jingning Han	9e7b09bc5d	SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero This commit enables ssse3 assembly implementation of the 8x8 inverse 2D-DCT with only first 10 coefficients non-zero. The average runtime for this unit goes down from 198 cycles to 129 cycles (34.8% faster). Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7	2014-05-07 17:40:02 -07:00
Dmitry Kovalev	68a600d82a	Merge "Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c."	2014-05-07 13:34:05 -07:00
Paul Wilkins	33b1c457ed	Revert "Add an MMX fwht4x4" Includes changes that are not compatible with VS windows builds. Amongst other things stdint.h is not supported in VS. This reverts commit `89fbf3de50`. Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd	2014-05-07 12:53:27 +01:00
Alex Converse	75d05d5ed4	Merge "Add an MMX fwht4x4"	2014-05-06 11:12:27 -07:00
Jingning Han	d289deb04c	Merge "SSSE3 implementation of full inverse 8x8 2D-DCT"	2014-05-06 09:17:22 -07:00
Dmitry Kovalev	e8bbb3d9db	Making vp9_get_sse_sum_{8x8, 16x16} static. Change-Id: Ifb7937c977308c682986f0ce9645a0807d2aa46a	2014-05-05 19:12:38 -07:00
Alex Converse	89fbf3de50	Add an MMX fwht4x4 7% faster encoding a desktop lossless at RT speed 4. Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64	2014-05-05 15:10:48 -07:00
Jingning Han	52ae97b6aa	SSSE3 implementation of full inverse 8x8 2D-DCT This commit enables SSSE3 version full inverse 8x8 2D-DCT and reconstruction. It makes the runtime of vp9_idct8x8_64_add down from 256 cycles (SSE2) to 246 cycles. Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3	2014-05-05 10:49:27 -07:00
Dmitry Kovalev	25a666ef39	Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c. Change-Id: I642a7d343677bf934e9a54cf4ad78e908620e39a	2014-05-01 16:45:49 -07:00
Jingning Han	39761eb5d6	Merge "Enable SSSE3 implementation of 8x8 forward 2D-DCT"	2014-04-30 13:41:36 -07:00
Dmitry Kovalev	d2bc8816a1	Merge "Adding search_site_config struct."	2014-04-29 16:59:47 -07:00
Jingning Han	1eaa3a76dc	Enable SSSE3 implementation of 8x8 forward 2D-DCT Assembly implementation of ssse3 8x8 forward 2D-DCT. The current version is turned on only for x86_64. The average unit runtime goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster. This translates into about 1.5% speed-up for pedestrian_area 1080p at speed 2. Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4	2014-04-29 15:49:18 -07:00
Dmitry Kovalev	9b042dc04c	Merge "Removing unused vp9_variance_halfpixvar*() functions."	2014-04-29 14:52:58 -07:00
Dmitry Kovalev	aa464eca5e	Adding search_site_config struct. Change-Id: I2ad333553e673dbabcdc0f0366aea311e90849bf	2014-04-29 10:34:53 -07:00
Dmitry Kovalev	7b59014b74	Removing old unused vp9_tapify.py. Change-Id: I7d66987fd04a3f98c140fc5f99ed0e9bc01f61d0	2014-04-25 15:19:31 -07:00
Dmitry Kovalev	6e01079cc0	Removing unused vp9_variance_halfpixvar*() functions. Change-Id: I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3	2014-04-25 11:50:07 -07:00
Dmitry Kovalev	03e7deae4f	Removing unused vp9_sub_pixel_mse* functions. Change-Id: I8d906da3bd6de0d3042676846f61a8b2a3444508	2014-04-24 11:49:12 -07:00
Dmitry Kovalev	e608418899	Renaming MB_PREDICTION_MODE to PREDICTION_MODE. Actually, it would be great to have two separate enums INTRA_MODES and INTER_MODES in future. Change-Id: I6c4147cf0002853da9c1e03fe9514eab876f01c8	2014-04-22 17:48:31 -07:00
Dmitry Kovalev	55977e4a4f	Merge "Moving frame_frags field from VP9Common to VP9_COMP."	2014-04-15 10:39:31 -07:00
Dmitry Kovalev	63fa722179	Removing unused cost arguments from mcomp functions. Change-Id: Id81a76d18be6b2de69f81bb563d74c3bb356d434	2014-04-11 10:24:36 -07:00
Yunqing Wang	23ccf71924	Merge "Fix encoder uninitialized read errors reported by drmemory"	2014-04-10 09:45:08 -07:00
Dmitry Kovalev	1d5ed021fb	Moving frame_frags field from VP9Common to VP9_COMP. Change-Id: I0f4a5c50561a2653d22c366c214a937272ecfa2c	2014-04-09 20:56:06 -07:00
Dmitry Kovalev	65e650e0c0	Merge "Revert "Converting set_prev_mi() to get_prev_mi().""	2014-04-09 20:44:30 -07:00
Dmitry Kovalev	60def47f21	Revert "Converting set_prev_mi() to get_prev_mi()." This reverts commit `22a3e30790` Change-Id: I460d905edf5fb2006da58c18fbe02c04d0c631bb	2014-04-09 15:23:16 -07:00
Tom Finegan	4fffefe189	Merge "Fix avx builds on macosx with clang 5.0."	2014-04-09 13:03:26 -07:00
Dmitry Kovalev	5ed83c3220	Merge "Converting set_prev_mi() to get_prev_mi()."	2014-04-09 10:27:05 -07:00
Yunqing Wang	2e7d327789	Merge "Use source frame difference to make partition decision"	2014-04-09 10:26:42 -07:00
Yunqing Wang	3a6670fcf8	Fix encoder uninitialized read errors reported by drmemory This patch fixed the uninitialized read errors in Issue 748: "dr memory VP9 encode errors". In vp9_convolve_avg_sse2, when width is 4, pavgb reads 8 bytes from dst buffer that is out of range. An error is reported although the data is not actually used later. This issue was resolved by preventing uninitialized reads. Change-Id: I109a54910aa47139cb13119de86f2062cff207df	2014-04-09 09:59:15 -07:00
Tom Finegan	f600b50a6e	Fix avx builds on macosx with clang 5.0. The macosx release of clang v5.0 identifies itself as: Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn) This version of clang uses the older _mm_broadcastsi128_si256, like v3.3, as given away in the LLVM svn version above. Change-Id: I4d6d59d5454efd57d2ae9e75f5eb7486af7cbd0c	2014-04-08 18:56:03 -07:00
Yunqing Wang	4e66293fcb	Use source frame difference to make partition decision Calculate the difference variance between last source frame and current source frame. The variance is calculated at 16x16 block level. The variances are compared to several thresholds to decide final partition sizes. An adaptive strategy is implemented to decide using SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions in the video. The switching test is done once every search_type_check_frequency frames. The selection of source_var_thresh needs to be investigated further later. RTC set Borg test showed 0.424% overall psnr gain, and 0.357% ssim gain. For clips with large enough static area, the encoding speedup is around 2% to 15%. Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156	2014-04-08 17:03:02 -07:00
Deb Mukherjee	d35df2d8ea	High-level hooks for Profile 2 (10/12 bit) Adds some high-level hooks for profile 2 before further progress on the implementation. According to the definitiion in this patch: 1. Profile 2 only supports 10 or 12 bit color but not 8 2. Profile 2 supports all color sampling modes: 444, 422 and 420, and alpha plane. 3. Profile 3 is currently undefined. Please consider the definition carefully and suggest modifications to the definition as needed. Change-Id: I5b284fc679e54ac5aee171af72fa7994cfd28995	2014-04-08 16:18:34 -07:00
Dmitry Kovalev	22a3e30790	Converting set_prev_mi() to get_prev_mi(). Change-Id: Iad4002d7aecaae0e25d88e286bacde7e6cd7264f	2014-04-07 16:01:34 -07:00
Dmitry Kovalev	b5e12dda52	Cleaning up vp9_{cx, dx}_iface.c files. Change-Id: Ib4e31ba74c4b882bd93942ef743f4a189892738d	2014-04-07 10:38:51 -07:00
Dmitry Kovalev	a9f324fa7f	Removing interp_kernel from MACROBLOCKD. Now interp_kernel is obtained when it is really required (based on mbmi->interp_filter value). Change-Id: I4c7a93c179d1045eba16e7526c293d02c9b8b47e	2014-04-03 15:28:42 -07:00
Dmitry Kovalev	8b8606a737	Merge "Cleaning up vp9_mvref_common.c."	2014-04-02 11:03:36 -07:00
Dmitry Kovalev	68027a0b8a	Merge "Grouping members in MB_MODE_INFO struct."	2014-04-02 11:00:58 -07:00
Dmitry Kovalev	86f44a91f4	Renaming two members in MACROBLOCKD struct. Renames: mi_8x8 -> mi mode_info_stride -> mi_stride Change-Id: I66f3e5fd1e7b7f46f108af5bb711c5fd9493c1be	2014-04-01 17:46:40 -07:00
Dmitry Kovalev	d42976c515	Common configuration for MACROBLOCKD struct. Change-Id: Ie2ea9dd8bd338cc9fe12ca9033df64f7644c68b3	2014-04-01 10:57:59 -07:00
Dmitry Kovalev	20d868f05d	Grouping members in MB_MODE_INFO struct. Change-Id: Ia6d7e7a08810e0c3401da4d10266828d560e6851	2014-03-28 17:44:13 -07:00
Yaowu Xu	4f857bacd2	[BITSTREAM]Fix the scaling calculation For very large size video image, the scaling calculation may need use value beyond the range of int. This commit upgrade the value to 64bit to make sure the calculation do not wrap around INT_MAX. The change corrected the decoder behavior. The bug affects only very large resolution video because the scaling calculation was sufficient for image size smaller than 2^13. This resolves issue: https://code.google.com/p/webm/issues/detail?id=750 Change-Id: I2d2ed303ca6482f31f819f3c07d6d3e98ef3adc5	2014-03-28 16:40:29 -07:00
Dmitry Kovalev	03349d2ba2	Moving dqcoeff array to MACROBLOCKD in decoder. Change-Id: I3e20c0cdb9d2437bddf21afb255855f2dead8e02	2014-03-28 10:36:16 -07:00
Dmitry Kovalev	38053687bc	Cleaning up vp9_mvref_common.c. Change-Id: I4eb815156ecaab02c9182e6e1abbea0e4d86c441	2014-03-27 17:50:02 -07:00
Dmitry Kovalev	0437575848	Merge "Removing prev_mi_8x8 from MACROBLOCKD."	2014-03-26 15:45:11 -07:00
Dmitry Kovalev	38c2d37b9d	Merge "Cleaning up vp9_entropymv.c."	2014-03-26 14:28:45 -07:00
Dmitry Kovalev	63f86c149a	Removing prev_mi_8x8 from MACROBLOCKD. Change-Id: I32beb5f18c10b5771146c55933b5555487f53633	2014-03-26 10:50:34 -07:00
Dmitry Kovalev	ed39c40a2e	Moving above_context to VP9_COMMON. Change-Id: I713af99d1e17e05a20eab20df51d74ebfd1a68d2	2014-03-25 10:40:08 -07:00
Yaowu Xu	34a3628a45	Merge "Fixed a build issue"	2014-03-25 10:22:18 -07:00
Yaowu Xu	59872069d2	Merge "Change back the scaling calculation."	2014-03-25 09:48:21 -07:00
Yaowu Xu	8051563972	Fixed a build issue Adding the missed include file. Change-Id: I7e48df6b0633afbebaf1ccb3062ae404e7203dc9	2014-03-25 09:45:54 -07:00
Dmitry Kovalev	5b8c834c1a	Initialization code cleanup. Change-Id: I47a8b4bf9a6cc0063d1a6785eaaad641d0659e24	2014-03-24 12:21:22 -07:00
Dmitry Kovalev	49bb6df0e2	Cleaning up vp9_entropymv.c. Change-Id: I01b3530779da89acb84c71bac5ccac456f00c5ac	2014-03-24 11:02:27 -07:00
Yunqing Wang	b458bb7c20	Merge "AVX2 SAD Optimization:"	2014-03-24 10:52:32 -07:00
Dmitry Kovalev	ac5bdc0ed8	Merge "Cleaning up vp9_loopfilter.c."	2014-03-24 09:02:06 -07:00
hkuang	22232ec602	Change back the scaling calculation. Let the calculation to be compatible with Google's HW implementation. Change-Id: I22e179888cdb0419e230351c0a47661b37051fef	2014-03-24 08:32:56 -07:00
Dmitry Kovalev	9895c9d4dd	Merge "Removing redundant {above, left}_seg_context manipulation code."	2014-03-22 22:31:48 -07:00
Dmitry Kovalev	2786938a3c	Merge "Renaming and making vp9_update_mode_info_border() static."	2014-03-21 21:19:18 -07:00
Dmitry Kovalev	58cc06f9b3	Cleaning up vp9_loopfilter.c. Change-Id: I7c7cf7d3c7b00d1c74ffa8aa8fb8d78a0e48326f	2014-03-21 16:31:15 -07:00
Frank Galligan	8345e76d61	Merge "Fix libvpx VP9 decoder dr memory errors"	2014-03-21 15:24:39 -07:00
Dmitry Kovalev	e141f10bfc	Renaming and making vp9_update_mode_info_border() static. Change-Id: Ibb72a29cae9ca9443aae56fc4c5458d190eae279	2014-03-21 14:02:25 -07:00
levytamar82	0fa8b668c1	AVX2 SAD Optimization: 2 functions were optimized for avx2 by using full 256 bit register In order to handle 32 elements in parallel instead of only 16 in parallel: 1. vp9_sad32x32x4d 2. vp9_sad64x64x4d The function level gain is 66% and the user level gain is ~1%. Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb	2014-03-21 13:53:32 -07:00
Yunqing Wang	9b5df3fabe	Fix libvpx VP9 decoder dr memory errors Fixed dr memory errors reported in Issue 736: https://code.google.com/p/webm/issues/detail?id=736 All elements in left_col buffer need to be initialized to ensure the correctness of SIMD operations in x86 optimized code. Change-Id: I8e7f26ab45cca8099c1f9342bcf852f828bda7e4	2014-03-21 12:23:47 -07:00
Dmitry Kovalev	4cb37bff96	Removing redundant {above, left}_seg_context manipulation code. Change-Id: Ib3c1746e61220c629cbd971b2458aa686b5c9e36	2014-03-21 12:12:55 -07:00
Dmitry Kovalev	a57de9da03	Merge "Reusing {above, left}_seg_context vars in both encoder and decoder."	2014-03-21 12:02:42 -07:00
Yaowu Xu	46c71e5eba	Merge "Remove duplicate declaration"	2014-03-21 08:44:04 -07:00
Dmitry Kovalev	7ad40117f1	Reusing {above, left}_seg_context vars in both encoder and decoder. Change-Id: Id1fa36c92cb007b73a450cc8552e810cedad38b9	2014-03-20 16:15:57 -07:00
Dmitry Kovalev	03781ff22d	Merge "Removing mi_stream."	2014-03-20 13:43:13 -07:00
Dmitry Kovalev	4b37dc8d87	Adding alloc_mi() function. Change-Id: I3b944884c048f589c86e0169aeb3c3855bc8b729	2014-03-19 13:31:47 -07:00
Yaowu Xu	7ef16efca1	Remove duplicate declaration Change-Id: Ic8e52a89e0df816c38cd8ff1b7c53862b9a6dff2	2014-03-19 12:23:32 -07:00
Yaowu Xu	8cb59992e8	Merge "Fix the md5 mismatch for some scale cases."	2014-03-19 11:13:28 -07:00
Dmitry Kovalev	8ccfcb765f	Removing mi_stream. Change-Id: If674140e30c223c88894b983fd22a583efb99dcf	2014-03-19 10:47:32 -07:00
Dmitry Kovalev	b8bc2d337a	Fixing warnings/errors from c++ compiler. Change-Id: Ia561dda53f2dd10e3a10a2df2adb8027ab19397a	2014-03-18 10:47:51 -07:00
hkuang	1f7e4856f8	Fix the md5 mismatch for some scale cases. Fixes issue #731 Change-Id: Id313e84b8fb4ff20f6a4e1ed11cb601927888318	2014-03-17 11:21:43 -07:00
Dmitry Kovalev	7c6337ba9e	Merge "Adding vp9_swap_mi_and_prev_mi() function."	2014-03-13 17:47:27 -07:00
Dmitry Kovalev	d8e5564129	Using MB_PREDICTION_MODE enum instead of int. Change-Id: I652d17f7bff84f75d015f4f39652472e14eb3134	2014-03-13 15:03:00 -07:00
Dmitry Kovalev	e65c564c78	Adding vp9_swap_mi_and_prev_mi() function. Change-Id: I18b3939f0b51085cdd25c9182c3a9c7536ca7e3e	2014-03-13 13:55:33 -07:00
Dmitry Kovalev	3dca8ca7af	Merge "Renaming mode2txfm_map to intra_mode_to_tx_type_lookup."	2014-03-12 23:29:29 -07:00
Yaowu Xu	17256ad763	Revert "With on demand border extension, clamping the MV" This reverts commit `b0fec6ab4a`. Change-Id: I9acd8ee0423f22d92138f11579611ff959331013	2014-03-12 19:40:15 -07:00

... 9 10 11 12 13 ...

3255 Commits