generic-library/vpx

Author	SHA1	Message	Date
James Yu	3cfed4bf76	VP9 common for ARMv8 by using NEON intrinsics 14 Add vp9_idct16x16_add_neon.c - vp9_idct16x16_256_add_neon_pass1 - vp9_idct16x16_256_add_neon_pass2 - vp9_idct16x16_10_add_neon_pass1 - vp9_idct16x16_10_add_neon_pass2 Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:19:54 -08:00
James Yu	ce76aeb00d	VP9 common for ARMv8 by using NEON intrinsics 13 Add vp9_idct8x8_add_neon.c - vp9_idct8x8_64_add_neon - vp9_idct8x8_10_add_neon Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:56:54 -08:00
James Yu	8c25f4af6a	VP9 common for ARMv8 by using NEON intrinsics 12 Add vp9_idct4x4_add_neon.c - vp9_idct4x4_16_add_neon Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:49:59 -08:00
James Yu	420f58f2d2	VP9 common for ARMv8 by using NEON intrinsics 11 Add vp9_idct16x16_1_add_neon.c - vp9_idct16x16_1_add_neon Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:06:58 -08:00
James Yu	030ca4d0e5	VP9 common for ARMv8 by using NEON intrinsics 10 Add vp9_idct32x32_1_add_neon.c - vp9_idct32x32_1_add_neon Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:04:29 -08:00
James Yu	2772b45ac0	VP9 common for ARMv8 by using NEON intrinsics 09 Add vp9_idct8x8_1_add_neon.c - vp9_idct8x8_1_add_neon Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:52:33 -08:00
James Yu	9114f0afdb	VP9 common for ARMv8 by using NEON intrinsics 08 Add vp9_idct4x4_1_add_neon.c - vp9_idct4x4_1_add_neon Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:49:28 -08:00
James Yu	01fc6f51e0	VP9 common for ARMv8 by using NEON intrinsics 07 Add vp9_convolve8_neon.c - vp9_convolve8_horiz_neon - vp9_convolve8_vert_neon Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	893534a996	VP9 common for ARMv8 by using NEON intrinsics 04 Add vp9_convolve8_avg_neon.c - vp9_convolve8_avg_horiz_neon - vp9_convolve8_avg_vert_neon Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	d12757f5c6	VP9 common for ARMv8 by using NEON intrinsics 03 Add vp9_copy_neon.c - vp9_convolve_copy_neon Change-Id: I291fc5423d06240876411bbceab03eae5ef585be Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:02:46 -08:00
Scott LaVarnway	617382a2e3	VP9 common for ARMv8 by using NEON intrinsics 02 Add vp9_avg_neon.c - vp9_convolve_avg_neon Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 19:00:21 -08:00
hkuang	4eee74d6ed	Fix clang ioc warning due to NULL src_mi pointer. The warning only happens in VP9 encoder's first pass due to src_mi is not set up yet. But it will not fail the encoder as left_mi and above_mi are not used in the first_pass and they will be set up again in the second pass. Change-Id: I12dffcd5fb1002b2b2dabb083c8726650e4b5f08	2014-12-09 14:32:48 -08:00
James Yu	5b098b1825	VP9 common for ARMv8 by using NEON intrinsics 01 Add vp9_loopfilter_neon.c - vp9_lpf_horizontal_4_neon - vp9_lpf_vertical_4_neon - vp9_lpf_horizontal_8_neon - vp9_lpf_vertical_8_neon Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 12:26:56 -08:00
Yunqing Wang	cddbdeabd0	Merge "SSSE3 Optimization for Atom processors using new instruction selection and ordering"	2014-12-08 13:34:54 -08:00
James Zern	c38d0490b3	Merge "Changes to assembler for NASM on mac."	2014-12-08 12:55:06 -08:00
hkuang	81e5cb86d3	Fix the comments. Change-Id: I9789476865a1b24dad54115d8f7edb4fed780b90	2014-12-08 12:44:09 -08:00
levytamar82	8f9d94ec17	SSSE3 Optimization for Atom processors using new instruction selection and ordering The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors. By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved. In the original code, the PSHUBF uses every byte and is consecutively copied. This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result. For example: filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7 REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15 PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors. There was no observed performance impact on Core processors (expected). Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0	2014-12-08 13:11:01 -07:00
hkuang	f925e5ce0f	Merge "Improve the performance by caching the left_mi and right_mi in macroblockd."	2014-12-08 10:24:17 -08:00
hkuang	382f86f945	Improve the performance by caching the left_mi and right_mi in macroblockd. This improve the deocde performance by ~2% on Nexus 7 2013. Change-Id: Ie9c4ba0371a149eb7fddc687a6a291c17298d6c3	2014-12-05 16:25:42 -08:00
hkuang	eaa6deee5b	Merge "Merge set_prev_mi function into encoder function."	2014-12-05 15:12:50 -08:00
Peter de Rivaz	a306bd8274	Use the RTC optimizations when in high bitdepth mode. Change 72193 made the encoder behave differently when configured with and without high bitdepth. This change means the same algorithm is used for both. Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5	2014-12-04 15:48:42 -08:00
hkuang	62de07c8c6	Merge set_prev_mi function into encoder function. Change-Id: Ifcf2efbb232ea4cabcdebbe77e0820d121e4a6da	2014-12-04 14:44:23 -08:00
Marco	8fd3f9a2fb	Enable non-rd mode coding on key frame, for speed 6. For key frame at speed 6: enable the non-rd mode selection in speed setting and use the (non-rd) variance_based partition. Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames), mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16. Loss in key frame quality (~0.6-0.7dB) compared to rd coding, but speeds up key frame encoding by at least 6x. Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6. Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405	2014-12-03 09:18:08 -08:00
Peter de Rivaz	7e40a55ef9	Added high bitdepth sse2 transform functions Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit `005d80cd05`) (cherry picked from commit `08d2f54800`) (cherry picked from commit `4230c2306c`)	2014-12-02 11:16:24 -08:00
Alex Converse	0496d11486	Fix a tautological assert. Change-Id: I90ad08823e1d038384536fa9f458caadc2c87f38	2014-11-24 15:01:01 -08:00
Debargha Mukherjee	e9d9f1adab	Merge "Refactored idct routines and headers"	2014-11-24 12:47:03 -08:00
John Stark	71379b87df	Changes to assembler for NASM on mac. fixes non-Apple nasm part of issue #755 Change-Id: I11955d270c4ee55e3c00e99f568de01b95e7ea9a	2014-11-24 12:00:50 -08:00
Peter de Rivaz	3a8c43a479	Refactored idct routines and headers This change is made in preparation for a subsequent patch which adds acceleration for the highbitdepth transform functions. The highbitdepth transform functions attempt to use 16/32bit sse instructions where possible, but fallback to using the C implementations if potential overflow is detected. For this reason the dct routines are made global so they can be called from the acceleration functions in the subsequent patch. Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665 (cherry picked from commit `454342d4e7`)	2014-11-24 09:57:40 -08:00
Debargha Mukherjee	02355a4abf	Merge "Added highbitdepth sse2 acceleration for quantize"	2014-11-21 16:08:47 -08:00
Peter de Rivaz	a7b2d09f36	Added highbitdepth sse2 acceleration for quantize Also includes block error. (This patch is mostly cherry picked from commit `db7192e0b0`) Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78	2014-11-19 23:55:19 -08:00
Jingning Han	c42715b721	Enable ssse3 version of vp9_fdct8x8_quant It improves the speed performance of vp9_fdct8x8_quant_sse2 by about 5%. Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097	2014-11-19 22:14:19 -08:00
Jingning Han	bf63652d34	Merge "Combine fdct8x8 and quantization process"	2014-11-19 11:17:44 -08:00
Jingning Han	ce77a7bcb0	Merge "Add sse2 version for vp9_quantize_fp"	2014-11-19 11:17:36 -08:00
Jingning Han	c6908fd5f7	Combine fdct8x8 and quantization process This commit reworks the forward transform and quantization process for 8x8 block coding. It combines the two operations in a single function to save a store/load stage of the original transform coefficients. Overall the speed -6 is slightly faster (around 1% range). The compression performance of speed -6 is improved by 3.4%. Change-Id: Id6628daef123f3e4649248735ec2ad7423629387	2014-11-18 18:10:56 -08:00
Jingning Han	2d3cc8ea2b	Add sse2 version for vp9_quantize_fp vp9_quantize_fp is the quantization process used by rtc coding mode. This commit adds a sse2 implementation of it. The implementation is modified based on vp9_quantize_b_sse2. No speed difference from ssse3 version. Change-Id: I24949c5b27df160b4f35117d28858d269454e64a	2014-11-18 09:01:41 -08:00
Yaowu Xu	1687c47bfd	change to call vp9_refining_search_sad() directly The function pointer in compressor instance does not change, so this commit changes to call the function directly. Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5	2014-11-17 11:30:17 -08:00
Peter de Rivaz	48032bfcdb	Added sse2 acceleration for highbitdepth variance Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f (cherry picked from commit `d7422b2b1e`) (cherry picked from commit `6d741e4d76`)	2014-11-14 15:18:53 -08:00
Debargha Mukherjee	002172efd6	Merge "Added highbitdepth sse2 SAD acceleration and tests"	2014-11-12 21:20:34 -08:00
Peter de Rivaz	7eee487c00	Added highbitdepth sse2 SAD acceleration and tests Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f (cherry picked from commit `b1a6f6b9cb`)	2014-11-12 14:25:45 -08:00
Deb Mukherjee	cc57c5e4af	Iadst transforms to use internal low precision Change-Id: I266777d40c300bc53b45b205144520b85b0d6e58 (cherry picked from commit `a1b726117f`)	2014-11-07 14:19:45 -08:00
Yaowu Xu	98492c1091	Merge "Change the use of a reserved color space entry"	2014-11-07 06:24:59 -08:00
Yaowu Xu	af3519a385	Change the use of a reserved color space entry This commit rename a reserved color space entry to BT_2020, it intends to provide support for VP9 bitstream to pass along the color space type defined in BT.2020(Rec.2020) please note this entry does not have any effect on encoding/decoding behavior, but allow applications to the pass the information along from encoding end to decoding end. Change-Id: I4678520e89141ea5e8900f7bd1c0e95b710b7091	2014-11-06 19:14:21 -08:00
Yunqing Wang	1228433430	Modify the frame context memory deallocation This patch was to fix the vpxdec fuzzing3 test failure. When an error occurs, setjmp() is invoked, which calls the decoder removing routine. In multiple thread situation, other threads could try to access the frame context memory that is already deallocated, thus causing a segfault. An invalid unit test was added for this issue. Change-Id: Ida7442154f3d89759483f0f4fe0324041fffb952	2014-11-06 11:34:19 -08:00
hkuang	e8860693ea	Merge "Totally remove prev_mi in VP9 decoder."	2014-11-05 17:48:47 -08:00
hkuang	4cc7c5a17f	Totally remove prev_mi in VP9 decoder. This will save the memory and improve the decode speed due to removing unnecessary memset of big prev_mi array for all the key frames. Decoding a all key frames 1080p video shows speed improve around 2%. Change-Id: I6284a445c1291056e3c15135c3c20d502f791c10	2014-11-05 16:14:30 -08:00
Yaowu Xu	2c4fee17bc	Fix visual studio 2013 compiler warnings For configured with --enable-vp9-highbitdepth Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6	2014-11-05 13:47:28 -08:00
hkuang	23da920a8e	Fix the memory leak due to missing free frame_mvs. Change-Id: I2ceee7341d906259002c0ea31ea009ae32c04bfd	2014-11-04 13:28:31 -08:00
Yunqing Wang	6d90a9d289	Merge "WORKAROUND FIX FOR GCC4.9.1"	2014-11-03 16:56:38 -08:00
levytamar82	86175a5788	WORKAROUND FIX FOR GCC4.9.1 In the function mb_lpf_horizontal_edge_w_avx2_16 the usage of the intrinsic _mm256_cvtepu8_epi16 cause a compiler bug in gcc 4.9.1. until it will be fixed I created a workaround that create the up convert by using broadcast128+shuffle. The bug was reported here: https://code.google.com/p/webm/issues/detail?id=867 Change-Id: I73452e6806f42e0fadcde96b804ea3afa7eeb351	2014-11-01 11:27:28 -07:00
hkuang	55577431ae	Bind motion vectors with frame buffer structure. This will save a lot of memory for decoder due to removing of prev_mi, but prev_mi is still needed in encoder. So this will increase a little bit memory for encoder. Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed	2014-10-31 17:01:08 -07:00
Hui Su	d478d2df37	Merge "Move the definition of switchable filter numbers into enum INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV."	2014-10-30 11:05:04 -07:00
James Zern	01900edc40	Merge changes I8a9c9019,Ic7b2faa3,I44d42a50,I3f3a3924,I10747b32,I31b49c9e * changes: add vp9_loop_filter_data_reset move LFWorkerData allocation to VP9LfSync vp9_loop_filter_frame_mt: remove pbi dependency vp9_loop_filter_frame_mt: pass planes directly vp9_loop_filter_frame_mt: pass VP9LfSync directly vp9: store TileWorkerData allocations separately	2014-10-24 11:43:51 -07:00
James Zern	01483677e5	add vp9_loop_filter_data_reset Change-Id: I8a9c9019242ec10fa499a78db322221bf96a0275	2014-10-23 19:43:48 +02:00
Yunqing Wang	330a6b2756	Merge "vp9_ethread: allocate frame contexts outside VP9_COMMON struct"	2014-10-22 17:10:39 -07:00
Yunqing Wang	7c7e4d4eb8	vp9_ethread: allocate frame contexts outside VP9_COMMON struct This patch allocated frame contexts outside VP9_COMMON. This allows multiple threads to share the same copy of frame contexts, and reduces the overhead. It also guarantees the correct update of these contexts during bitstream packing. This patch doesn't change encoding result. Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353	2014-10-22 15:03:12 -07:00
Frank Galligan	95a568b3a8	Fix Neon convolve profiling When profiling, gprof can't distinguish between matching labels in different files. Change-Id: I56770df212ed314a0d8568071fa8157624ef1e8f	2014-10-22 10:51:53 -07:00
Hangyu Kuang	9ce3a7d76c	Implement frame parallel decode for VP9. Using 4 threads, frame parallel decode is ~3x faster than single thread decode and around 30% faster than tile parallel decode for frame parallel encoded video on both Android and desktop with 4 threads. Decode speed is scalable to threads too which means decode could be even faster with more threads. Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e	2014-10-22 10:50:58 -07:00
Hui Su	8947b18fa3	Move the definition of switchable filter numbers into enum INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV. Change-Id: Ic36c9eb6ccb8ec324d991f7241e42b40b60b1dcb	2014-10-21 15:41:37 -07:00
Yunqing Wang	687c56e802	Merge "SAD32xh and SAD64xh for AVX2"	2014-10-20 12:37:55 -07:00
levytamar82	7045aec00a	SAD32xh and SAD64xh for AVX2 All sad function that process above 32 consecutive elements are optimized for AVX2: vp9_sad64x64 vp9_sad64x32 vp9_sad32x64 vp9_sad32x32 vp9_sad32x16 vp9_sad64x64_avg vp9_sad64x32_avg vp9_sad32x64_avg vp9_sad32x32_avg vp9_sad32x16_avg The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64 vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90% both of them gave and overall ~2.3% user level gain Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd	2014-10-19 13:59:10 -07:00
Peter de Rivaz	73ae6e495c	Add highbitdepth function for vp9_avg_8x8 Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/ (`a92f987a6b`) on highbitdepth branch. Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8	2014-10-17 17:04:37 -07:00
James Zern	e9b8810b4d	move LFWorkerData allocation to VP9LfSync this removes an assumption that worker->data1 would be pointing to a TileWorkerData allocation. additionally, within the multi-threaded loopfilter pass VP9LfSync as a parameter to the worker hook, removing the need for a shadow pointer in LFWorkerData. Change-Id: Ic7b2faa34e3eb59dbcb8a7c67f333448fa047c88	2014-10-16 18:55:46 +02:00
Alex Converse	00a9671bbd	Merge "Add a 32-bit friendly sse2 quantizer."	2014-10-14 14:35:02 -07:00
Alex Converse	7497d2fb23	Add a 32-bit friendly sse2 quantizer. This is based on the 64-bit ssse3 quantizer. 1.1x speedup for screen content at speed 7. Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448	2014-10-14 11:37:41 -07:00
hkuang	c38a8edf16	Merge "Remove extra line."	2014-10-14 11:05:01 -07:00
Adrian Grange	f7c336aa19	Merge "Remove mi_grid_base_array from VP9_COMMON (unused)"	2014-10-14 07:50:17 -07:00
hkuang	c5fd035ce0	Use pre increment. Change-Id: I016b4e77d8268e189473f4c382603afe1ae1750f	2014-10-13 14:07:03 -07:00
Adrian Grange	83b63d573a	Remove mi_grid_base_array from VP9_COMMON (unused) Change-Id: I4b4764463f5a7cdc01ec004b882c6237466c74b0	2014-10-13 11:54:05 -07:00
hkuang	dbe91de6d4	Remove extra line. Change-Id: I5e79c276d8953ae17cd35b2846e6e40660c037c3	2014-10-10 14:59:04 -07:00
hkuang	effc1a6f56	Correct the code format. Change-Id: If2de420f8123a4e8bf635dd29205dd74ee174eee	2014-10-09 17:57:45 -07:00
Deb Mukherjee	9a29fdbae7	Merge "Rename highbitdepth functions to use highbd prefix"	2014-10-09 15:39:56 -07:00
Deb Mukherjee	1929c9b391	Rename highbitdepth functions to use highbd prefix Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e	2014-10-09 14:40:40 -07:00
James Zern	caa0f81914	vp9_rtcd_defs: fix vp9_avg_8x8 declaration vp9_avg_8x8 does not depend on x86inc, fixes 32-bit OS X build Change-Id: I709b874ea84bf57c8cdb5ac7d43eecc6b8c1a2dd	2014-10-09 10:44:42 +02:00
Jingning Han	f6ff752c63	Merge "Clean up header files in vp9_blockd.h and related files"	2014-10-08 15:25:09 -07:00
Jingning Han	1c3398675f	Merge "Use #define statement for MAX_MB_PLANE"	2014-10-08 15:24:56 -07:00
Jim Bankoski	20254d1daa	Merge "experimental : partition using 1/8 x 1/8 image"	2014-10-08 09:04:26 -07:00
Jim Bankoski	0ce51d823f	experimental : partition using 1/8 x 1/8 image The concept: There's too much noise in source pixels for variance and at low bitrate the reconstructed looks nothing like the source so we have problems getting good partitionings with either. This skirts the issue by using a box blur scaled down version for variance calculations. To compare against source_var_ moved keyframe to be rd based like source_var. Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624	2014-10-07 16:36:14 -07:00
Jingning Han	608c4acc1f	Merge "Remove vp9_blockd.h from vp9_common_data.c"	2014-10-07 15:34:07 -07:00
Jingning Han	3bbec7b422	Merge "Replace mi_width_log2() with mi_width_log2_lookup table"	2014-10-07 15:33:52 -07:00
Jingning Han	27c9577f8e	Merge "Take out repeated block width/height lookup functions"	2014-10-07 15:33:45 -07:00
Jingning Han	6ad272cb84	Clean up header files in vp9_blockd.h and related files This commit breaks the overly broad header files into more targeted and smaller ones, to help better structure the system layout. Change-Id: I7b24559d3ea6e582cf5d9bbe8f71459f9824d71b	2014-10-07 15:17:10 -07:00
Jingning Han	3c28fb768d	Use #define statement for MAX_MB_PLANE Change-Id: I3a7f83ab1dbfcedc8a82fe798c2fa30dd9c7d696	2014-10-07 15:00:22 -07:00
Jingning Han	d7febaf5c5	Remove extra empty line Change-Id: I6f2865bb8ba9295f5c45a4cad065aecbe1e63c32	2014-10-07 14:06:54 -07:00
Jingning Han	bd9706506f	Merge "Move inter filter defs to vp9_filter.h"	2014-10-07 13:42:26 -07:00
Jingning Han	ebd724852e	Remove vp9_blockd.h from vp9_common_data.c The basic data defs should be above block operation level. Change-Id: I7dd9836d01120ab75e0c472baac9f15495ed0db5	2014-10-07 13:02:54 -07:00
Jingning Han	7ee58985bd	Replace mi_width_log2() with mi_width_log2_lookup table Change-Id: If0ea98aa139d14d40cd924114e18396aff36b5a5	2014-10-07 12:45:25 -07:00
Jingning Han	b66f7016c1	Take out repeated block width/height lookup functions The functions b_width_log2 and b_height_log2 only do direct table fetch. This commit unifies such use cases by using the table directly and removes these functions. Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b	2014-10-07 12:33:07 -07:00
Jingning Han	5d9cdac087	Move inter filter defs to vp9_filter.h Add comments on the use case of these definitions. Further reduce the scope of header file in vp9_context_tree.h. Change-Id: Ic4a7638e838d0ac441b64abfc56e57354c059d75	2014-10-07 12:16:37 -07:00
Deb Mukherjee	cfc337aae8	Merge "Resolves some static analysis / undefined warnings"	2014-10-07 12:15:26 -07:00
Deb Mukherjee	fced63ed30	Resolves some static analysis / undefined warnings Also fixes a case of distortion becoming negative and messing up the RDCOST computation. Change-Id: Id345af9e8dfff31ade622be5756e51f2cdface53	2014-10-07 11:20:56 -07:00
JackyChen	a9f479682a	Merge "Add SSE2 code and unit test for VP9 denoiser."	2014-10-07 10:51:55 -07:00
JackyChen	80465dae88	Add SSE2 code and unit test for VP9 denoiser. This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are only 16x16 blocks in denoiser, while in VP9, there are 13 different block sizes. By adding this SSE2 code, the improvement of encoder speed is around 20%(using C code vs using SSE2 code), vary for different clips. The unit test for VP9 denoiser is to confirm that the SSE2 code is bit-exact with the C code. The unit test covers all block size. Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d	2014-10-06 15:27:40 -07:00
Jingning Han	12344f2697	Add range check in inverse ADST 16x16 Bit-stream clarification related to Issue 868. Change-Id: I92a7bc5b7782c9ea5c3f6cceec761742183c9514	2014-10-06 11:07:58 -07:00
Deb Mukherjee	3bcc2af8cd	Some data type changes in vp9_idct.c Resolves a visual studio warning, and includes some cleanups. Change-Id: I6a7576ef323c475b7d1c659800cd82c6cb1fd18d	2014-10-04 16:03:04 -07:00
Deb Mukherjee	8a01074d04	Merge "Incorporate WRAPLOW macro into non-highbitdepth tx"	2014-10-03 12:45:39 -07:00
Deb Mukherjee	d50716face	Incorporate WRAPLOW macro into non-highbitdepth tx Incorporates the WRAPLOW macro into the non-highbitdepth transforms to aid hardware verification between a software C model and an intended hardware implementation though the use of the configure options: --enable-experimental --enable-emulate-hardware. Note that to avoid further discrepancies between the sse/sse2 implementations of the transforms and the C implementation, when the emulate hardware option is invoked, we also disable sse/sse2/etc. Also incudes some minor cleanups/renaming etc. Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287	2014-10-03 11:38:05 -07:00
Yaowu Xu	f809475c73	Merge "Make iscan and scan neighbor arrays static const."	2014-10-02 15:15:58 -07:00
Yaowu Xu	9712bc691d	Make iscan and scan neighbor arrays static const. This commit changes the tables to be read only, which fixes issue #866 Change-Id: I85bbe03f9d344f50570f8c1c61699bdc5cee248f	2014-10-02 14:08:14 -07:00
Alexander Voronov	befc36d4a7	Fix invalid memory access in inter prediction (issue 853). Change-Id: I5a566d6ade720f212a60c0ad5d6f1ee1d1d37f2e	2014-10-02 18:57:47 +04:00
Jingning Han	c7d719325e	Merge "Remove redundant header file from vp9_idct.h"	2014-10-01 17:05:36 -07:00
Deb Mukherjee	30fbf23fda	Merge "High-bitdepth bugfixes"	2014-10-01 16:47:43 -07:00
Jingning Han	74c2997bc9	Remove redundant header file from vp9_idct.h Change-Id: Id92544762e7b96d3c729dfc8e04ecff91cbcc7f9	2014-10-01 14:58:27 -07:00
Deb Mukherjee	a160d72522	High-bitdepth bugfixes Miscellaneous bug-fixes for high bitdepth functionality. With this patch, high bit-depth profiles become mostly functional, except for an intermittent assert failure issue that is being tracked. Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c	2014-10-01 14:18:11 -07:00
Jingning Han	3d17f0d45f	Remove repeated vpx_integer.h from vp9_prob.h The file vpx_integer.h has been included and used in the parent file vp9_common.h. Change-Id: I9c65f08353576f9ef1e5ea17244fc5ca964ec002	2014-10-01 12:45:52 -07:00
Jingning Han	764c00ab50	Use precise header files in vp9_entropymv.h The commit cleans up the header files in vp9_entropymv.h. This file should only depend on vp9_mv.h and vp9_prob.h. Remove the giant vp9_blockd.h from header file list. Change-Id: I44cd26d2cfd10a16a9325778347dd53f888a874c	2014-10-01 12:41:08 -07:00
Deb Mukherjee	872b207b78	Moves transform type defines to vp9_common Moves transform type defines to vp9_common.h from vp9_idct.h so that they can be included in vp9_rtcd_defs.pl safely. Change-Id: Id5106227bee5934f7ce8b06f2eb9fa8a9a2e0ddb	2014-09-30 19:44:17 -07:00
James Zern	4a296e6baa	Revert "Fix compiling error in vp9_idct.h" This reverts commit `eafc8c9c40`. tran_low_t/tran_high_t don't belong in a public header, they're private. Similarly the public headers shouldn't rely on config defines, vpx_config.h isn't installed. Change-Id: I194ec273598da418df8dd727b6c0e78a556740ad	2014-09-30 16:08:55 -07:00
Jingning Han	0829d2be7f	Remove redundant header file declaration Some header file in vp9_idct.c has been included in vp9_idct.h. This commit removes these redundant declarations. Change-Id: I0238c27e4efff5c981eb437022c6bc6970c4e445	2014-09-30 09:13:00 -07:00
Jingning Han	eafc8c9c40	Fix compiling error in vp9_idct.h This commit fixes a compiling error in vp9_idct.h, where the codec checks that the intermediate steps of transformation fit within 16-bit length. The issue was due to broken file dependency. Change-Id: Ib22bba13a1e6df28489cb23d6774c561969f1fdc	2014-09-30 09:11:59 -07:00
Deb Mukherjee	9ed23de13f	Miscellaneous decoder changes for high bitdepth Also includes yv12 config changes. Change-Id: Iacf40d8bf486815b54c32a127ce3cd4516b7e44f	2014-09-29 11:27:45 -07:00
hkuang	c53a95ad1d	Avoid calling vp9_is_scaled two times in a function. Use a local variable to hold the result of vp9_is_scaled. Change-Id: I5e203909805923e20eefef596bc84424da47dbe2	2014-09-25 11:52:16 -07:00
Yaowu Xu	845d4f333d	Fix a couple of comments The first comment is obselete given the way is now normative in VP9 bitstream. The second comment line was too long. Change-Id: I6546585babf60d466485ddcf2daa6d2fa79e999a	2014-09-25 08:24:16 -07:00
Yaowu Xu	d237d483a5	Correct the condition for border extension As reported in issue #850, the condition for border extension was not complete. This commit added the case when the scaling is enabled. This fixes issue #850. Change-Id: I67768b23f0dcc4ac9a9aa0a0825b0fe8cb85a72e	2014-09-24 11:26:40 -07:00
Yaowu Xu	148c57d231	Merge "Fix invalid memory access on 2x downscale."	2014-09-24 09:58:05 -07:00
Alexander Voronov	eafd842a3e	Fix incorrect subsampling used in VP9 non420 loopfilter. Change-Id: Ia959e24b4676242c80a8867d2c39a6fee90f71a5	2014-09-24 17:01:09 +04:00
Deb Mukherjee	e2a90c0b21	Merge "High bit-depth loop/arf/postproc filter functions"	2014-09-23 17:26:32 -07:00
Deb Mukherjee	931ed516ba	High bit-depth loop/arf/postproc filter functions Adds high-bitdepth loopfilter, temporal filter and postproc functions Change-Id: I81c8a9176890784686bc4f2af0d550d243b3b2d3	2014-09-23 16:20:43 -07:00
hkuang	c70cea97ac	Remove mi_grid_* structures. mi_grid_* are arrays of pointer to pointer. They save the pointers that point to the MIs in cm->mi. But they are unnecessary and complicated. The original goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer inside MODE_INFO_t, same goal could be achieved. This commit totally removes the mi_grid_* structures. But there are still many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit will do on-demand MODE_INFO_t allocation in order to save these memories. Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6	2014-09-19 21:27:11 -07:00
Deb Mukherjee	822b51609b	High bit-depth coefficient coding functions Tokenization and Detokenization enhancements for 10/12 bit Change-Id: I3c269ec30f8eb160ee024905638a193975237559	2014-09-19 15:21:24 -07:00
Frank Galligan	49dc7b05d0	Merge "FIX: vp9_loopfilter_intrin_sse2.c"	2014-09-18 15:10:16 -07:00
Scott LaVarnway	13284311eb	FIX: vp9_loopfilter_intrin_sse2.c Fixes Visual Studio build failures Change-Id: I233719cd63b3ad0db16e2834bf1d7ea1df805880	2014-09-18 13:09:13 -07:00
Deb Mukherjee	6d0ee9860e	Merge "Adds high bitdepth convolve, interpred & scaling"	2014-09-18 10:52:23 -07:00
Deb Mukherjee	0d3c3d3ce7	Adds high bitdepth convolve, interpred & scaling Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3	2014-09-18 07:26:17 -07:00
Frank Galligan	4e066299d9	Merge "Improved mb_lpf_horizontal_edge_w_sse2_16() #2 "	2014-09-17 18:52:30 -07:00
Scott LaVarnway	217e3cb1fb	Improved mb_lpf_horizontal_edge_w_sse2_16() #2 The decoder performance improved up to 1% for the test clips used. Change-Id: I4621112bdccfba01640322facfa4ba8da8290ea5	2014-09-17 17:25:20 -07:00
Deb Mukherjee	7d0e4f9ad1	Resolves a few gcc warnings clang is fine. Change-Id: Ia4e9ff17ea3b86bc87dca35828ee7ce45bea6994	2014-09-16 22:44:40 -07:00
Deb Mukherjee	f7cf05cfe0	Merge "Adding high-bitdepth intra prediction functions"	2014-09-16 17:10:24 -07:00
Frank Galligan	ecd7e3d2b7	Merge "Remove memset of every external frame buffer."	2014-09-16 15:17:26 -07:00
Deb Mukherjee	81a8138fc3	Adding high-bitdepth intra prediction functions Change-Id: I6f5cb101e2dc57c3d3f4d7e0ffb4ddbed027d111	2014-09-16 15:04:39 -07:00
Deb Mukherjee	5cd0aab81a	Adds high bitdepth quantization functions Adds various high bitdepth quantization functions. Change-Id: I36fc0bf75a1bd15128ed271df8723de0ac134b0c	2014-09-16 14:55:37 -07:00
Yaowu Xu	601f3a886e	Fix a performance regression This commit adds back sse2 or ssse3 optimized versio of a couple of functions, fixes a ~10% performance regression. Change-Id: I049786906e5a641224dced63c6492aec9d86d183	2014-09-16 11:18:46 -07:00
Frank Galligan	175d9dfe0a	Remove memset of every external frame buffer. Libvpx was memseting every external frame buffer before decode. This was to work around a valgrind issue in our C loop filter. Most of the time this was not needed and we have noticed some significant performance loss on some platforms. Now we require the application to zero out the buffers if it is using external frame buffers. Change-Id: I7330d00a315e65137ed30edd5f813e8929b76242	2014-09-15 15:37:36 -07:00
Alexander Voronov	29071a418e	Fix invalid memory access on 2x downscale. The issue was discovered on bitstream with 2x vertical downscale. For zero MVs, y_pad is set to 1 only when vertical convolution is required. The original code assumes that for y_step_q4 == 32 we don't perform vertical convolution. But vp9_setup_scale_factors_for_frame() sets convolve functions so that when x_step and y_step are both not equal to 16, convolve in both directions is performed. And convolve() unconditionally subtracts one stride from source pointer when calls convolve_horiz(). This leads to invalid memory access. Change-Id: I882dfa6081a58e172b5ffa55842bfcd6727f10bf	2014-09-15 17:50:20 +04:00
Jingning Han	82fad6f4b6	Merge "Add a note for enum values of MV_REFERENCE_FRAME"	2014-09-13 10:42:45 -07:00
Deb Mukherjee	10783d4f3a	Adds high bitdepth transform functions and tests Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8	2014-09-11 19:56:33 -07:00
Deb Mukherjee	1e4136d35d	Adds high bit depth sad and variance functions Moves high bit depth sad/var functions from highbitdepth branch to master. Change-Id: If03845d8ef9c9c494e13350e7a587c289306b94d	2014-09-11 17:30:44 -07:00
Johann	ac2f2e7855	Merge "Allow specifying opt dependencies"	2014-09-11 16:02:41 -07:00
Johann	8645a53039	Allow specifying opt dependencies If optimizations use more than one cpu feature, allow specifying them so that '--disable-X' still works https://code.google.com/p/webm/issues/detail?id=854 Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18	2014-09-11 13:43:48 -07:00
Jingning Han	3ef9786b7e	Add a note for enum values of MV_REFERENCE_FRAME Change-Id: Ifaf6738f26e86ded6eb6ea1465bad7a229612999	2014-09-11 10:55:42 -07:00
Jim Bankoski	0e66848081	Merge "LoopFilterWorkerData: remove misleading 'const'"	2014-09-10 06:33:51 -07:00
James Zern	2215d2f135	Merge changes If8887e1d,I36bfc9c8,I3d1e6c42 * changes: vp9_dthread: simplify loop_filter_row_worker signature simplify vp9_loop_filter_worker signature vp9_decodeframe: simplify tile_work_hook signature	2014-09-09 16:50:28 -07:00
Dmitry Kovalev	8e205a2a09	Merge "Cleaning up and speeding up vp9_idct32x32_1024_add_sse2()."	2014-09-09 12:50:23 -07:00
James Zern	7b572c9806	LoopFilterWorkerData: remove misleading 'const' 'frame_buffer' is modified indirectly via 'planes'. + do the same for vp9_loop_filter_rows Change-Id: Ibb7daa2e261064e4a5317a2969e3490e59891b82	2014-09-08 20:06:48 -07:00
James Zern	48662747bd	simplify vp9_loop_filter_worker signature use the type names directly in the function declaration rather than (void arg1, void arg2) Change-Id: I36bfc9c886310ce370bf0ca7c679ebd6e95109cc	2014-09-08 19:53:46 -07:00
Dmitry Kovalev	980abf6078	Fixing Mac OS build. Change-Id: Ifae8906185a868a07685eb7a7da2484af95e70a7	2014-09-08 08:53:12 -07:00
Dmitry Kovalev	70092af5c0	Cleaning up and speeding up vp9_idct32x32_1024_add_sse2(). Change-Id: If91017b792572c9db6e257011ca307bef8428486	2014-09-05 18:12:30 -07:00
Dmitry Kovalev	89963bf586	Merge "Removing postproc mmx code."	2014-09-05 18:11:08 -07:00
Dmitry Kovalev	54bec0971f	Merge "Initializing intra modes without vpx_once()."	2014-09-05 12:03:36 -07:00
Dmitry Kovalev	1100e262c5	Removing postproc mmx code. Removed functions: * vp9_post_proc_down_and_across_mmx * vp9_mbpost_proc_down_mmx * vp9_plane_add_noise_mmx They all have sse2 equivalent. Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff	2014-09-05 11:52:50 -07:00
James Zern	a8083449e9	fix x86-darwin* build vp9_variance_sse2.c contains a mix of intrinsics and references to assembly which uses x86inc.asm; it's conditionally included as a result. Change-Id: I254451483a65881c0b8e18e27bf0c3ddef60c4ec	2014-09-04 23:32:13 -07:00

1 2 3 4 5 ...

2699 Commits