generic-library/vpx

Author	SHA1	Message	Date
Parag Salasakar	ef51c1ab5b	mips msa vp9 convolve8 hv optimization average improvement ~5x-8x Change-Id: I3214734cb3716e742907ce0d2d7a042d953df82b	2015-04-21 09:17:49 +05:30
Parag Salasakar	2e36149ccd	Merge "mips msa vp9 convolve8 vert optimization"	2015-04-18 23:39:25 -07:00
Parag Salasakar	27d083c1b9	mips msa vp9 convolve8 vert optimization average improvement ~6x-10x Change-Id: Ie3f3ab3a9005be84935919701e56b404e420affa	2015-04-18 08:13:04 +05:30
Marco Paniconi	f76ccce5bc	Revert "Revert "Force_split on 16x16 blocks in variance partition."" This reverts commit `004b9d83e3` Change-Id: I2f2d0bdb9368c2c07f1d29a69cd461267a3a8743	2015-04-16 17:52:13 -07:00
Yunqing Wang	004b9d83e3	Revert "Force_split on 16x16 blocks in variance partition." This reverts commit `eb8c667570`. The patch caused mismatch while using multi-threads. Change-Id: Icd646340af25b5d91e32f03ed3ea212e00e3e0be	2015-04-14 15:19:31 -07:00
Marco	eb8c667570	Force_split on 16x16 blocks in variance partition. Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks. Also increase variance threshold for 32x32, and add exit condiiton in choose_partition (with very safe threshold) based on sad used to select reference frame. Some visual improvement near moving boundaries. Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%. Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip. Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577	2015-04-13 12:05:07 -07:00
Jingning Han	93d9c50419	Merge "SSSE3 assembly implementation of 8x8 Hadamard transform"	2015-04-09 11:16:11 -07:00
Jingning Han	7f629dfca4	SSSE3 assembly implementation of 8x8 Hadamard transform It uses about 10% less CPU cycles than the SSE2 intrinsic implementation. Change-Id: I91017c0c068679a214b98cdd4cff3a6facfb7499	2015-04-04 09:59:37 -07:00
James Zern	44e3640923	Merge "vp9: enable sse4 sad functions"	2015-04-03 14:57:52 -07:00
James Zern	b644384bb5	Merge "vp9: fix high-bitdepth NEON build"	2015-04-01 23:36:17 -07:00
Jingning Han	1470529f62	Refactor block_yrd function for RTC coding mode This commit separates Hadamard transform/quantization operations from rate and distortion computation in block_yrd. This allows one to skip SATD computation when all transform blocks are quantized to zero. It also uses a new block error function that skips repeated computation of sum of squared residuals. It reduces the CPU cycles spent on block error calculation in block_yrd by 40%. Change-Id: I726acb2454b44af1c3bd95385abecac209959b10	2015-04-01 12:00:43 -07:00
James Zern	14e24a1297	vp9: enable sse4 sad functions sse4 isn't set by configure or used in rtcd, correct the sad entries to use sse4_1 without changing the signatures for now. this was done in vp8 post-vp9 branch. Change-Id: Ia9f1fff9f2476fdfa53ed022778dd2f708caa271	2015-03-31 21:00:55 -07:00
James Zern	8845334097	vp9: fix high-bitdepth NEON build remove incorrect specializations in rtcd and update a configuration check in partial_idct_test.cc Change-Id: I20f551f38ce502092b476fb16d3ca0969dba56f0	2015-03-31 17:45:25 -07:00
Jingning Han	26d3d3af6a	Enable 16x16 Hadamard transform in SATD based mode decision This commit replaces the 16x16 2D-DCT transform with Hadamard transform for RTC coding mode. It reduces the CPU cycles cost on 16x16 transform by 5X. Overall it makes the speed -6 encoding speed 1.5% faster without compromise on compression performance. Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b	2015-03-30 15:43:31 -07:00
Jingning Han	8c411f74e0	Hadamard transform based coding mode decision process This commit uses Hadamard transform based rate-distortion cost estimate for rtc coding mode decision. It improves the compression performance of speed -6 for many hard clips at lower bit-rates. For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for niklas720p. This will introduce extra encoding cycle costs at this point. Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375	2015-03-30 14:46:05 -07:00
Jingning Han	1790d45252	Use variance metric for integral projection vector match This commit replaces the SAD with variance as metric for the integral projection vector match. It improves the search accuracy in the presence of slight light change. The average speed -6 compression performance for rtc set is improved by 1.7%. No speed changes are observed for the test clips. Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d	2015-03-01 10:42:56 -08:00
Jingning Han	ed2dc59c1b	Integral projection based motion estimation This commit introduces a new block match motion estimation using integral projection measurement. The 2-D block and the nearby region is projected onto the horizontal and vertical 1-D vectors, respectively. It then runs vector match, instead of block match, over the two separate 1-D vectors to locate the motion compensated reference block. This process is run per 64x64 block to align the reference before choosing partitioning in speed 6. The overall CPU cycle cost due to this additional 64x64 block match (SSE2 version) takes around 2% at low bit-rate rtc speed 6. When strong motion activities exist in the video sequence, it substantially improves the partition selection accuracy, thereby achieving better compression performance and lower CPU cycles. The experiments were tested in RTC speed -6 setting: cloud 1080p 500 kbps 17006 b/f, 37.086 dB, 5386 ms -> 16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster) pedestrian_area 1080p 500 kbps 53537 b/f, 36.771 dB, 18706 ms -> 51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings) blue_sky 1080p 500 kbps 70214 b/f, 33.600 dB, 13979 ms -> 53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster) jimred 400 kbps 13380 b/f, 36.014 dB, 5723 ms -> 13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower) Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c	2015-02-19 13:47:19 -08:00
Frank Galligan	e3167f7fbf	Add vp9_sad32x32x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~18% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c	2015-01-27 08:54:00 -08:00
Frank Galligan	9f574d0316	Add vp9_sad16x16x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~15% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9	2015-01-27 08:42:17 -08:00
Frank Galligan	54fa956715	Add vp9_sad64x64x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: Id12af7d1883243c23e6692e898aea82299633d58	2015-01-27 08:33:40 -08:00
Frank Galligan	9f6eba419a	Add Neon intrinsic vp9_fdct8x8_quant_neon On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51	2015-01-24 22:49:50 -08:00
JackyChen	65f60f8e8c	Merge "SSE2 code for the filter in MFQE."	2015-01-23 11:08:16 -08:00
JackyChen	09673deba9	SSE2 code for the filter in MFQE. The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8 side. In our testing, we achieve 2X speed by adopting this change. Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716	2015-01-18 16:07:59 -08:00
Frank Galligan	6e7e1cf32f	Add Neon intrinsics for vp9_avg_8x8_neon On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee	2015-01-15 15:32:40 -08:00
Frank Galligan	ec1d8387e1	Add 64x64 sub_pel_variance Neon function On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334	2015-01-14 08:36:24 -08:00
Frank Galligan	74d40cd507	Add 64x variance Neon functions Add optimized Neon functions of: vp9_variance32x64 vp9_variance64x32 vp9_variance64x64 On Nexus 7 speed -5 and -6 saw about a 4% increase in perf. Speeds -7 and -8 saw about a 6% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa	2015-01-13 15:08:13 -08:00
Johann	377b6682f9	Disable vp9 _8_ loopfilters Investigating https://code.google.com/p/chromium/issues/detail?id=443839 Change-Id: Ibb7485d835c5aa5e1d40f31715596ba8d208eedb	2015-01-06 19:26:11 -08:00
Jingning Han	d0f2377027	Revert "Revert "Removal of legacy zbin_extra / zbin_oq_value."" This reverts commit `9946ee23e0`. Fix the ssse3 asm function. Change-Id: I07f77a63aa98087626e45c4e87aa5dcafc0b0b07	2014-12-22 10:09:25 -08:00
Paul Wilkins	9946ee23e0	Revert "Removal of legacy zbin_extra / zbin_oq_value." This reverts commit `e9b586e21b`. Change-Id: I5b36e6727da6c05278d97e2c37b80c109f79bed4	2014-12-19 15:02:58 +00:00
Paul Wilkins	e9b586e21b	Removal of legacy zbin_extra / zbin_oq_value. zbin extra / zbin_oq_value was widely passed around, hence removal touches a lot of code. Change-Id: Idc94359735b60c38a160e4385ae09d5ca8b6b8e5	2014-12-18 16:49:11 +00:00
James Yu	aeeaa67987	VP9 common for ARMv8 by using NEON intrinsics 15 Re-write - vp9_lpf_horizontal_4_dual_neon in vp9_loopfilter_16_neon.c Change-Id: Ie14f63d352f9564ad01db3939a61d91cf6d21a31 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 20:00:26 -08:00
James Yu	aa8dd897c1	VP9 common for ARMv8 by using NEON intrinsics 16 Add vp9_reconintra_neon.c - vp9_v_predictor_4x4_neon - vp9_v_predictor_8x8_neon - vp9_v_predictor_16x16_neon - vp9_v_predictor_32x32_neon - vp9_h_predictor_4x4_neon - vp9_h_predictor_8x8_neon - vp9_h_predictor_16x16_neon - vp9_h_predictor_32x32_neon - vp9_tm_predictor_4x4_neon - vp9_tm_predictor_8x8_neon - vp9_tm_predictor_16x16_neon - vp9_tm_predictor_32x32_neon Change-Id: Ib5d54a4766a1b5127169045659974f33aa98376d Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 12:57:52 -08:00
James Yu	4f856cd7fa	VP9 common for ARMv8 by using NEON intrinsics 06 Add vp9_iht8x8_add_neon.c - vp9_iht8x8_64_add_neon The assembly did not previously implement tx_type 0 BUG=716 Change-Id: Icfc99dd24f3d59047f9184a7d0c761ba7e3de934 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-15 12:18:06 -08:00
James Yu	6b71013277	VP9 common for ARMv8 by using NEON intrinsics 05 Add vp9_iht4x4_add_neon.c - vp9_iht4x4_16_add_neon The assembly did not previously implement tx_type 0 BUG=715 Change-Id: I60034d1568de034edba45c5cdd13f3d87dbc73b6 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-15 12:16:19 -08:00
James Yu	3f7c12dab9	VP9 common for ARMv8 by using NEON intrinsics 18 Add vp9_idct32x32_add_neon.c - vp9_idct32x32_1024_add_neon Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:20:04 -08:00
James Yu	3cfed4bf76	VP9 common for ARMv8 by using NEON intrinsics 14 Add vp9_idct16x16_add_neon.c - vp9_idct16x16_256_add_neon_pass1 - vp9_idct16x16_256_add_neon_pass2 - vp9_idct16x16_10_add_neon_pass1 - vp9_idct16x16_10_add_neon_pass2 Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:19:54 -08:00
James Yu	ce76aeb00d	VP9 common for ARMv8 by using NEON intrinsics 13 Add vp9_idct8x8_add_neon.c - vp9_idct8x8_64_add_neon - vp9_idct8x8_10_add_neon Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:56:54 -08:00
James Yu	8c25f4af6a	VP9 common for ARMv8 by using NEON intrinsics 12 Add vp9_idct4x4_add_neon.c - vp9_idct4x4_16_add_neon Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:49:59 -08:00
James Yu	420f58f2d2	VP9 common for ARMv8 by using NEON intrinsics 11 Add vp9_idct16x16_1_add_neon.c - vp9_idct16x16_1_add_neon Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:06:58 -08:00
James Yu	030ca4d0e5	VP9 common for ARMv8 by using NEON intrinsics 10 Add vp9_idct32x32_1_add_neon.c - vp9_idct32x32_1_add_neon Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:04:29 -08:00
James Yu	2772b45ac0	VP9 common for ARMv8 by using NEON intrinsics 09 Add vp9_idct8x8_1_add_neon.c - vp9_idct8x8_1_add_neon Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:52:33 -08:00
James Yu	9114f0afdb	VP9 common for ARMv8 by using NEON intrinsics 08 Add vp9_idct4x4_1_add_neon.c - vp9_idct4x4_1_add_neon Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:49:28 -08:00
James Yu	01fc6f51e0	VP9 common for ARMv8 by using NEON intrinsics 07 Add vp9_convolve8_neon.c - vp9_convolve8_horiz_neon - vp9_convolve8_vert_neon Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	893534a996	VP9 common for ARMv8 by using NEON intrinsics 04 Add vp9_convolve8_avg_neon.c - vp9_convolve8_avg_horiz_neon - vp9_convolve8_avg_vert_neon Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	d12757f5c6	VP9 common for ARMv8 by using NEON intrinsics 03 Add vp9_copy_neon.c - vp9_convolve_copy_neon Change-Id: I291fc5423d06240876411bbceab03eae5ef585be Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:02:46 -08:00
Scott LaVarnway	617382a2e3	VP9 common for ARMv8 by using NEON intrinsics 02 Add vp9_avg_neon.c - vp9_convolve_avg_neon Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 19:00:21 -08:00
James Yu	5b098b1825	VP9 common for ARMv8 by using NEON intrinsics 01 Add vp9_loopfilter_neon.c - vp9_lpf_horizontal_4_neon - vp9_lpf_vertical_4_neon - vp9_lpf_horizontal_8_neon - vp9_lpf_vertical_8_neon Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 12:26:56 -08:00
Peter de Rivaz	a306bd8274	Use the RTC optimizations when in high bitdepth mode. Change 72193 made the encoder behave differently when configured with and without high bitdepth. This change means the same algorithm is used for both. Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5	2014-12-04 15:48:42 -08:00
Marco	8fd3f9a2fb	Enable non-rd mode coding on key frame, for speed 6. For key frame at speed 6: enable the non-rd mode selection in speed setting and use the (non-rd) variance_based partition. Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames), mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16. Loss in key frame quality (~0.6-0.7dB) compared to rd coding, but speeds up key frame encoding by at least 6x. Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6. Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405	2014-12-03 09:18:08 -08:00
Peter de Rivaz	7e40a55ef9	Added high bitdepth sse2 transform functions Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit `005d80cd05`) (cherry picked from commit `08d2f54800`) (cherry picked from commit `4230c2306c`)	2014-12-02 11:16:24 -08:00
Debargha Mukherjee	02355a4abf	Merge "Added highbitdepth sse2 acceleration for quantize"	2014-11-21 16:08:47 -08:00
Peter de Rivaz	a7b2d09f36	Added highbitdepth sse2 acceleration for quantize Also includes block error. (This patch is mostly cherry picked from commit `db7192e0b0`) Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78	2014-11-19 23:55:19 -08:00
Jingning Han	c42715b721	Enable ssse3 version of vp9_fdct8x8_quant It improves the speed performance of vp9_fdct8x8_quant_sse2 by about 5%. Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097	2014-11-19 22:14:19 -08:00
Jingning Han	bf63652d34	Merge "Combine fdct8x8 and quantization process"	2014-11-19 11:17:44 -08:00
Jingning Han	ce77a7bcb0	Merge "Add sse2 version for vp9_quantize_fp"	2014-11-19 11:17:36 -08:00
Jingning Han	c6908fd5f7	Combine fdct8x8 and quantization process This commit reworks the forward transform and quantization process for 8x8 block coding. It combines the two operations in a single function to save a store/load stage of the original transform coefficients. Overall the speed -6 is slightly faster (around 1% range). The compression performance of speed -6 is improved by 3.4%. Change-Id: Id6628daef123f3e4649248735ec2ad7423629387	2014-11-18 18:10:56 -08:00
Jingning Han	2d3cc8ea2b	Add sse2 version for vp9_quantize_fp vp9_quantize_fp is the quantization process used by rtc coding mode. This commit adds a sse2 implementation of it. The implementation is modified based on vp9_quantize_b_sse2. No speed difference from ssse3 version. Change-Id: I24949c5b27df160b4f35117d28858d269454e64a	2014-11-18 09:01:41 -08:00
Yaowu Xu	1687c47bfd	change to call vp9_refining_search_sad() directly The function pointer in compressor instance does not change, so this commit changes to call the function directly. Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5	2014-11-17 11:30:17 -08:00
Peter de Rivaz	48032bfcdb	Added sse2 acceleration for highbitdepth variance Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f (cherry picked from commit `d7422b2b1e`) (cherry picked from commit `6d741e4d76`)	2014-11-14 15:18:53 -08:00
Peter de Rivaz	7eee487c00	Added highbitdepth sse2 SAD acceleration and tests Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f (cherry picked from commit `b1a6f6b9cb`)	2014-11-12 14:25:45 -08:00
Yunqing Wang	687c56e802	Merge "SAD32xh and SAD64xh for AVX2"	2014-10-20 12:37:55 -07:00
levytamar82	7045aec00a	SAD32xh and SAD64xh for AVX2 All sad function that process above 32 consecutive elements are optimized for AVX2: vp9_sad64x64 vp9_sad64x32 vp9_sad32x64 vp9_sad32x32 vp9_sad32x16 vp9_sad64x64_avg vp9_sad64x32_avg vp9_sad32x64_avg vp9_sad32x32_avg vp9_sad32x16_avg The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64 vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90% both of them gave and overall ~2.3% user level gain Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd	2014-10-19 13:59:10 -07:00
Peter de Rivaz	73ae6e495c	Add highbitdepth function for vp9_avg_8x8 Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/ (`a92f987a6b`) on highbitdepth branch. Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8	2014-10-17 17:04:37 -07:00
Alex Converse	7497d2fb23	Add a 32-bit friendly sse2 quantizer. This is based on the 64-bit ssse3 quantizer. 1.1x speedup for screen content at speed 7. Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448	2014-10-14 11:37:41 -07:00
Deb Mukherjee	9a29fdbae7	Merge "Rename highbitdepth functions to use highbd prefix"	2014-10-09 15:39:56 -07:00
Deb Mukherjee	1929c9b391	Rename highbitdepth functions to use highbd prefix Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e	2014-10-09 14:40:40 -07:00
James Zern	caa0f81914	vp9_rtcd_defs: fix vp9_avg_8x8 declaration vp9_avg_8x8 does not depend on x86inc, fixes 32-bit OS X build Change-Id: I709b874ea84bf57c8cdb5ac7d43eecc6b8c1a2dd	2014-10-09 10:44:42 +02:00
Jim Bankoski	0ce51d823f	experimental : partition using 1/8 x 1/8 image The concept: There's too much noise in source pixels for variance and at low bitrate the reconstructed looks nothing like the source so we have problems getting good partitionings with either. This skirts the issue by using a box blur scaled down version for variance calculations. To compare against source_var_ moved keyframe to be rd based like source_var. Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624	2014-10-07 16:36:14 -07:00
JackyChen	a9f479682a	Merge "Add SSE2 code and unit test for VP9 denoiser."	2014-10-07 10:51:55 -07:00
JackyChen	80465dae88	Add SSE2 code and unit test for VP9 denoiser. This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are only 16x16 blocks in denoiser, while in VP9, there are 13 different block sizes. By adding this SSE2 code, the improvement of encoder speed is around 20%(using C code vs using SSE2 code), vary for different clips. The unit test for VP9 denoiser is to confirm that the SSE2 code is bit-exact with the C code. The unit test covers all block size. Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d	2014-10-06 15:27:40 -07:00
Deb Mukherjee	d50716face	Incorporate WRAPLOW macro into non-highbitdepth tx Incorporates the WRAPLOW macro into the non-highbitdepth transforms to aid hardware verification between a software C model and an intended hardware implementation though the use of the configure options: --enable-experimental --enable-emulate-hardware. Note that to avoid further discrepancies between the sse/sse2 implementations of the transforms and the C implementation, when the emulate hardware option is invoked, we also disable sse/sse2/etc. Also incudes some minor cleanups/renaming etc. Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287	2014-10-03 11:38:05 -07:00
Deb Mukherjee	a160d72522	High-bitdepth bugfixes Miscellaneous bug-fixes for high bitdepth functionality. With this patch, high bit-depth profiles become mostly functional, except for an intermittent assert failure issue that is being tracked. Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c	2014-10-01 14:18:11 -07:00
Deb Mukherjee	872b207b78	Moves transform type defines to vp9_common Moves transform type defines to vp9_common.h from vp9_idct.h so that they can be included in vp9_rtcd_defs.pl safely. Change-Id: Id5106227bee5934f7ce8b06f2eb9fa8a9a2e0ddb	2014-09-30 19:44:17 -07:00
James Zern	4a296e6baa	Revert "Fix compiling error in vp9_idct.h" This reverts commit `eafc8c9c40`. tran_low_t/tran_high_t don't belong in a public header, they're private. Similarly the public headers shouldn't rely on config defines, vpx_config.h isn't installed. Change-Id: I194ec273598da418df8dd727b6c0e78a556740ad	2014-09-30 16:08:55 -07:00
Jingning Han	eafc8c9c40	Fix compiling error in vp9_idct.h This commit fixes a compiling error in vp9_idct.h, where the codec checks that the intermediate steps of transformation fit within 16-bit length. The issue was due to broken file dependency. Change-Id: Ib22bba13a1e6df28489cb23d6774c561969f1fdc	2014-09-30 09:11:59 -07:00
Deb Mukherjee	931ed516ba	High bit-depth loop/arf/postproc filter functions Adds high-bitdepth loopfilter, temporal filter and postproc functions Change-Id: I81c8a9176890784686bc4f2af0d550d243b3b2d3	2014-09-23 16:20:43 -07:00
Deb Mukherjee	0d3c3d3ce7	Adds high bitdepth convolve, interpred & scaling Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3	2014-09-18 07:26:17 -07:00
Deb Mukherjee	81a8138fc3	Adding high-bitdepth intra prediction functions Change-Id: I6f5cb101e2dc57c3d3f4d7e0ffb4ddbed027d111	2014-09-16 15:04:39 -07:00
Deb Mukherjee	5cd0aab81a	Adds high bitdepth quantization functions Adds various high bitdepth quantization functions. Change-Id: I36fc0bf75a1bd15128ed271df8723de0ac134b0c	2014-09-16 14:55:37 -07:00
Yaowu Xu	601f3a886e	Fix a performance regression This commit adds back sse2 or ssse3 optimized versio of a couple of functions, fixes a ~10% performance regression. Change-Id: I049786906e5a641224dced63c6492aec9d86d183	2014-09-16 11:18:46 -07:00
Deb Mukherjee	10783d4f3a	Adds high bitdepth transform functions and tests Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8	2014-09-11 19:56:33 -07:00
Deb Mukherjee	1e4136d35d	Adds high bit depth sad and variance functions Moves high bit depth sad/var functions from highbitdepth branch to master. Change-Id: If03845d8ef9c9c494e13350e7a587c289306b94d	2014-09-11 17:30:44 -07:00
Johann	8645a53039	Allow specifying opt dependencies If optimizations use more than one cpu feature, allow specifying them so that '--disable-X' still works https://code.google.com/p/webm/issues/detail?id=854 Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18	2014-09-11 13:43:48 -07:00
Dmitry Kovalev	980abf6078	Fixing Mac OS build. Change-Id: Ifae8906185a868a07685eb7a7da2484af95e70a7	2014-09-08 08:53:12 -07:00
Dmitry Kovalev	89963bf586	Merge "Removing postproc mmx code."	2014-09-05 18:11:08 -07:00
Dmitry Kovalev	1100e262c5	Removing postproc mmx code. Removed functions: * vp9_post_proc_down_and_across_mmx * vp9_mbpost_proc_down_mmx * vp9_plane_add_noise_mmx They all have sse2 equivalent. Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff	2014-09-05 11:52:50 -07:00
James Zern	a8083449e9	fix x86-darwin* build vp9_variance_sse2.c contains a mix of intrinsics and references to assembly which uses x86inc.asm; it's conditionally included as a result. Change-Id: I254451483a65881c0b8e18e27bf0c3ddef60c4ec	2014-09-04 23:32:13 -07:00
Dmitry Kovalev	490943552f	Removing unused function prototypes. Change-Id: Ia5e383e2cf18052f6f1eacf8b9495ab8e4d58878	2014-09-04 14:26:30 -07:00
Dmitry Kovalev	48197f0a70	Adding sse2 variant for vp9_mse{8x8, 8x16, 16x8}. Change-Id: I6786d25ce4f32b8d8912f2d239a45ca15b310c4b	2014-09-03 19:02:14 -07:00
Dmitry Kovalev	318fc0c34f	Removing MMX SAD calculation code. Removed functions: * vp9_sad_16x16_mmx * vp9_sad_8x16_mmx * vp9_sad_16x8_mmx * vp9_sad_8x8_mmx * vp9_sad_4x4_mmx Change-Id: Ic5174b93b64d65d846f0c11e72cab149e9472bc3	2014-09-02 14:41:36 -07:00
Dmitry Kovalev	12cd6f421d	Removing variance MMX code. Removed functions: * vp9_mse16x16_mmx * vp9_get_mb_ss_mmx * vp9_get4x4var_mmx * vp9_get8x8var_mmx * vp9_variance4x4_mmx * vp9_variance8x8_mmx * vp9_variance16x16_mmx * vp9_variance16x8_mmx * vp9_variance8x16_mmx They all have SSE2 equivalent. Change-Id: I3796f2477c4f59b35b4828f46a300c16e62a2615	2014-08-29 10:26:42 -07:00
Yaowu Xu	23c88870ec	Merge "Fix bug 804"	2014-08-21 08:56:32 -07:00
levytamar82	69a5f5ecf7	Fix bug 807 in the sub_pixel_variance function the dst is aligned to 16 bytes and not to 32 bytes - now load unaligned data Change-Id: I2e0b9745543697efc56fefa32857ea10117af135	2014-08-07 18:51:02 -07:00
levytamar82	839911fb6d	Fix bug 804 A bug in Microsoft compiler was found in the function vp9_filter_block1d16_v8_avx2 and a workaround applied. the bug occur when there was 4 consecutive maddubs + min + adds intrinsic instructions. Change-Id: I83499faeb70971e650e5663fd2490360ddb1a51b	2014-08-07 15:09:24 -07:00
levytamar82	af10457e02	Fix bug 806 in the function sad32x32x4d and sad64x64x4d the source is aligned to 16 bytes and not to 32 bytes - the load is now unaligned. Change-Id: I922fdba56d0936b5cf72e4503519f185645a168c	2014-08-07 14:13:30 -07:00
Frank Galligan	5f8fa13258	Merge "Added vp9_sad8x8_neon()"	2014-08-01 14:11:38 -07:00
Scott LaVarnway	98165ec074	Neon version of vp9_sub_pixel_variance8x8(), vp9_variance8x8(), and vp9_get8x8var(). On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~1.2%. Change-Id: I8a66ac2a0f550b407caa27816833bdc563395102	2014-08-01 11:35:55 -07:00
Frank Galligan	5487b6067c	Merge "Neon version of vp9_sub_pixel_variance32x32(),"	2014-08-01 09:46:37 -07:00
Scott LaVarnway	545be78136	Added vp9_sad8x8_neon() Change-Id: I3be8911121ef9a5f39f6c1a2e28f9e00972e0624	2014-08-01 06:36:18 -07:00
Scott LaVarnway	6f4b8dcdc2	Neon version of vp9_subtract_block() On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~3.2% Change-Id: I8862497264142171b7efc32df1a67714a23539f4	2014-07-31 09:28:06 -07:00
Scott LaVarnway	d39448e2d4	Neon version of vp9_sub_pixel_variance32x32(), vp9_variance32x32(), and vp9_get32x32var(). Change-Id: I8137e2540e50984744da59ae3a41e94f8af4a548	2014-07-31 08:00:36 -07:00
Scott LaVarnway	d4a37db5b8	Neon version of vp9_quantize_fp() On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~12.4% Change-Id: Id29d215acf58bb108489e218a259adf74b4768d7	2014-07-30 09:33:46 -07:00
Scott LaVarnway	521cf7e879	Neon version of vp9_sub_pixel_variance16x16(), vp9_variance16x16(), and vp9_get16x16var(). On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~16.7%. Change-Id: Ib163aa99f56e680194aabe00dacdd7f0899a4ecb	2014-07-30 08:17:32 -07:00
Scott LaVarnway	d19d222db6	Added vp9_fdct8x8_neon(), vp9_fdct8x8_1_neon() On a Nexus 7, vpxenc (in realtime mode, speed -12) reported a performance improvement of ~3.7%. Change-Id: I428c72c40df82c6d537955e320a8debf99343004	2014-07-29 08:56:05 -07:00
levytamar82	4ba92dc5ab	Fix bug 805 Remove all the redundant dct functions (dct4x4, dct8x8) in avx2 except dct32x32 those functions were copied originally from dct_sse2 Change-Id: I742576fbf5175f3ac09f2076976a9247b259323e	2014-07-28 15:46:01 -07:00
Scott LaVarnway	ba0652e83a	Merge "Added vp9_sad64x64_neon(), vp9_sad32x32_neon()"	2014-07-17 11:42:16 -07:00
Scott LaVarnway	696fa52eaa	Added vp9_sad64x64_neon(), vp9_sad32x32_neon() and vp9_sad16x16_neon() On a Nexus 7, vpxenc (in realtime mode, speed -6) reported a performance improvement of ~17%. Change-Id: I91e070cde2973451083d3f3d63b49b7886de9a85	2014-07-16 12:54:46 -07:00
Yunqing Wang	75cd57503d	Refactor vp9_diamond_search_sad function Currently, vp9_diamond_search_sadx4() is only called when sse3 is enabled, which is improper since sse2 optimization of sdx4df functions are available. Changed to always use vp9_diamond_search_sadx4(). Change-Id: I4b95d6b7a3c6c645783c373f0ba8d645ece24717	2014-07-10 09:19:03 -07:00
Yunqing Wang	30117a576d	Refactor refining_search_sad code There are sse2 optimization of sdx4df functions. Instead of calling vp9_refining_search_sadx4 only when sse3 is enabled, call it always. Change-Id: I24f93818f7d4209d1425039e0eb099ff9ff08fe9	2014-07-09 16:50:11 -07:00
Jingning Han	9ad1b9fc67	Re-design quantization process for 32x32 transform block This commit enables a new quantization process for 32x32 2D-DCT transform coefficient blocks. It improves the compression performance of speed 5 by 1.4%. The overall compression gains of speed 5 due to the new quantization scheme is 4.7%. It also includes the SSSE3 implementation of the 32x32 quantization process. Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b	2014-07-08 16:55:28 -07:00
Jingning Han	9ac2f66320	Re-design quantization process This commit re-designs the quantization process for transform coefficient blocks of size 4x4 to 16x16. It improves compression performance for speed 7 by 3.85%. The SSSE3 version for the new quantization process is included. The average runtime of the 8x8 block quantization is reduced from 285 cycles -> 255 cycles, i.e., over 10% faster. Change-Id: I61278aa02efc70599b962d3314671db5b0446a50	2014-07-01 17:00:07 -07:00
James Zern	88df435d6b	Merge "vp9_rtcd: correct avx2 references"	2014-06-16 17:39:13 -07:00
Jingning Han	d5ae43318e	Merge "Fast computation path for forward transform and quantization"	2014-06-12 11:59:52 -07:00
Jingning Han	ccba289f8d	Fast computation path for forward transform and quantization This commit enables a fast path computational flow for forward transformation. It checks the sse and variance of prediction residuals and decides if the quantized coefficients are all zero, dc only, or more. It then selects the corresponding coding path in the forward transformation and quantization stage. It is currently enabled in rtc coding mode. Will do it for rd coding mode next. In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up. Overall coding performance for rtc set is changed by -0.18%. Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1	2014-06-12 11:10:54 -07:00
James Zern	9f3a0dbb5e	vp9_rtcd: correct avx2 references s/"\$avx2_x86inc"/"avx2"/ avx2 code is all intrinsics and as a result doesn't rely on x86inc.asm Change-Id: I76ad39474d8a00658f3e43131830ef0f4f34772a	2014-06-10 16:26:36 -07:00
James Zern	520cb3f39f	vp9_sub_pixel_variance: disable avx2 variants tests failing under Win32/Win64 + variance_test: add missing avx2 functions (partially disabled) Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d	2014-06-10 16:11:15 -07:00
James Zern	d3ff009d84	vp9_sad*x4d: disable avx2 variants tests failing under Win32/Win64 + sad_test: add missing avx2 functions (disabled) Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856	2014-06-10 16:10:12 -07:00
James Zern	dd9f502933	vp9_f(dct\|ht): disable avx2 variants tests failing under Win32/Win64 + dct16x16_test: add missing avx2 functions (partially disabled) exercises the forward transforms no idct/iht implementations, so the c-code is used Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61	2014-06-09 18:48:11 -07:00
James Zern	5704578f5f	convolve: disable avx2 variants tests failing under Win32/Win64 Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf	2014-06-09 18:42:03 -07:00
Jingning Han	0c4a4225ec	Merge "Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs"	2014-06-03 16:51:39 -07:00
Dmitry Kovalev	f7ff24cdd0	Reusing existing vp9_get{8x8, 16x16}var() instead of new ones. Change-Id: I87b7c657d8813d7fb383ab519d150c0ffb1dd377	2014-05-29 11:14:06 -07:00
Jingning Han	6d21cbd20b	Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs This commit enables SSSE3 implementation of the inverse 2D-DCT with only first 10 coefficients non-zero. It reduces the runtime of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up. Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe	2014-05-28 10:53:33 -07:00
Yunqing Wang	1f2200080b	Revert "Making vp9_get_sse_sum_{8x8, 16x16} static." This reverts commit `e8bbb3d9db`. Change-Id: Ie368d36fd249d323d859d208609c711f04537bbc	2014-05-27 13:37:08 -07:00
Deb Mukherjee	444f93945b	Merge "Remove Wextra warnings from vp9_sad.c"	2014-05-27 11:54:05 -07:00
Jingning Han	48b0891370	Inverse 16x16 2D-DCT SSSE3 implementation This commit enables the SSSE3 implementation of full inverse 16x16 2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles, about 7% speed-up. Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d	2014-05-23 15:09:35 -07:00
Deb Mukherjee	916550428d	Remove Wextra warnings from vp9_sad.c As a side-effect, the sad unit tests for VP8 and VP9 had to be separated. Change-Id: I068cc2391eed51e9b140ea6aba78338c5fec8d71	2014-05-22 22:21:16 -07:00
Deb Mukherjee	a185bc3350	Extends temporal filtering to work for 422 data This is needed for profiles 1 and 2. Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028	2014-05-20 15:19:40 -07:00
Jim Bankoski	ec82d2dfec	Merge "Revert "Remove Wextra warnings from vp9_sad.c""	2014-05-15 11:54:23 -07:00
Yunqing Wang	c661cf0dad	Merge "AVX2 To VP9 Block Error Optimization"	2014-05-15 11:29:29 -07:00
Jim Bankoski	a16794dd31	Revert "Remove Wextra warnings from vp9_sad.c" This reverts commit `7ab9a9587b` Nightly test http://build.webmproject.org/jenkins/view/libvpx-nightly-tests/job/libvpx%20unit%20tests%20(valgrind-2)/arch=x86_64-linux-gcc,filter=-VP8:Large./276/console Failed This patch did not address all the assembly issues some of the vp8 assembly counts on 5 arguments being passed in to this function: one example : vp8_sad8x16_wmt Please address or split this into vp9 and vp8 patches. Change-Id: I78afcc171649894f887bb8ee3c66de24aaddc7ca	2014-05-15 08:31:20 -07:00
levytamar82	1fbab853c8	AVX2 To VP9 Block Error Optimization vp9_block_error_sse2 can only handle 16 bytes at a time but the function requires to handle a sequence of 32 bytes at a time so each 16 bytes is handled in a different register. With AVX2 optimization the 32 bytes can be handled in one register instead of two in the SSE2 The vp9_block_error was optimized by 85%. The user level was optimized by 1.2% Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd	2014-05-14 11:51:07 -07:00
Deb Mukherjee	7ab9a9587b	Remove Wextra warnings from vp9_sad.c As a side-effect, the max_sad check is removed from the C-implementation of VP8, for consistency with VP9, and to ensure that the SAD tests common to VP8/VP9 pass. That will make the VP8 C implementation of sad a little slower but given that is rarely used in practice, the impact will be minimal. Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca	2014-05-14 03:17:31 -07:00
Johann	ce23931a3f	Only build neon assembly for armv7 targets Allow selectively building just the intrinsics for armv8 Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477	2014-05-12 08:52:02 -07:00
Alex Converse	ec8a3272fa	Merge "Add an x86inc MMX fwht4x4."	2014-05-09 13:48:49 -07:00
Jingning Han	9412785b02	Merge changes I3edd4b95,I4514f974,Ie7fa4386 * changes: Turn on unit tests for SSSE3 8x8 forward and inverse 2D-DCT Change eob threshold for partial inverse 8x8 2D-DCT to 12 SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero	2014-05-09 09:58:39 -07:00
Alex Converse	b5422fab46	Add an x86inc MMX fwht4x4. Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197	2014-05-08 12:01:27 -07:00
Jingning Han	41a350a83d	Change eob threshold for partial inverse 8x8 2D-DCT to 12 The scanning order has the first 12 coefficients of the 8x8 2D-DCT sitting in the top left 4x4 block. Hence the partial inverse 8x8 2D-DCT allows to handle cases with eob below 12. The overall runtime of the inverse 8x8 2D-DCT unit is reduced from 166 cycles (using SSE2) to 150 cycles (using SSSE3). Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2	2014-05-08 09:48:58 -07:00
Jingning Han	9e7b09bc5d	SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero This commit enables ssse3 assembly implementation of the 8x8 inverse 2D-DCT with only first 10 coefficients non-zero. The average runtime for this unit goes down from 198 cycles to 129 cycles (34.8% faster). Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7	2014-05-07 17:40:02 -07:00
Paul Wilkins	33b1c457ed	Revert "Add an MMX fwht4x4" Includes changes that are not compatible with VS windows builds. Amongst other things stdint.h is not supported in VS. This reverts commit `89fbf3de50`. Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd	2014-05-07 12:53:27 +01:00
Alex Converse	75d05d5ed4	Merge "Add an MMX fwht4x4"	2014-05-06 11:12:27 -07:00
Jingning Han	d289deb04c	Merge "SSSE3 implementation of full inverse 8x8 2D-DCT"	2014-05-06 09:17:22 -07:00
Dmitry Kovalev	e8bbb3d9db	Making vp9_get_sse_sum_{8x8, 16x16} static. Change-Id: Ifb7937c977308c682986f0ce9645a0807d2aa46a	2014-05-05 19:12:38 -07:00
Alex Converse	89fbf3de50	Add an MMX fwht4x4 7% faster encoding a desktop lossless at RT speed 4. Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64	2014-05-05 15:10:48 -07:00
Jingning Han	52ae97b6aa	SSSE3 implementation of full inverse 8x8 2D-DCT This commit enables SSSE3 version full inverse 8x8 2D-DCT and reconstruction. It makes the runtime of vp9_idct8x8_64_add down from 256 cycles (SSE2) to 246 cycles. Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3	2014-05-05 10:49:27 -07:00
Jingning Han	39761eb5d6	Merge "Enable SSSE3 implementation of 8x8 forward 2D-DCT"	2014-04-30 13:41:36 -07:00
Dmitry Kovalev	d2bc8816a1	Merge "Adding search_site_config struct."	2014-04-29 16:59:47 -07:00
Jingning Han	1eaa3a76dc	Enable SSSE3 implementation of 8x8 forward 2D-DCT Assembly implementation of ssse3 8x8 forward 2D-DCT. The current version is turned on only for x86_64. The average unit runtime goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster. This translates into about 1.5% speed-up for pedestrian_area 1080p at speed 2. Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4	2014-04-29 15:49:18 -07:00
Dmitry Kovalev	aa464eca5e	Adding search_site_config struct. Change-Id: I2ad333553e673dbabcdc0f0366aea311e90849bf	2014-04-29 10:34:53 -07:00
Dmitry Kovalev	6e01079cc0	Removing unused vp9_variance_halfpixvar*() functions. Change-Id: I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3	2014-04-25 11:50:07 -07:00
Dmitry Kovalev	03e7deae4f	Removing unused vp9_sub_pixel_mse* functions. Change-Id: I8d906da3bd6de0d3042676846f61a8b2a3444508	2014-04-24 11:49:12 -07:00
Dmitry Kovalev	63fa722179	Removing unused cost arguments from mcomp functions. Change-Id: Id81a76d18be6b2de69f81bb563d74c3bb356d434	2014-04-11 10:24:36 -07:00
Yunqing Wang	4e66293fcb	Use source frame difference to make partition decision Calculate the difference variance between last source frame and current source frame. The variance is calculated at 16x16 block level. The variances are compared to several thresholds to decide final partition sizes. An adaptive strategy is implemented to decide using SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions in the video. The switching test is done once every search_type_check_frequency frames. The selection of source_var_thresh needs to be investigated further later. RTC set Borg test showed 0.424% overall psnr gain, and 0.357% ssim gain. For clips with large enough static area, the encoding speedup is around 2% to 15%. Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156	2014-04-08 17:03:02 -07:00
levytamar82	0fa8b668c1	AVX2 SAD Optimization: 2 functions were optimized for avx2 by using full 256 bit register In order to handle 32 elements in parallel instead of only 16 in parallel: 1. vp9_sad32x32x4d 2. vp9_sad64x64x4d The function level gain is 66% and the user level gain is ~1%. Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb	2014-03-21 13:53:32 -07:00
James Zern	805078a1bf	build: convert rtcd.sh to perl significantly speeds up file generation. the goal of this change is to convert rtcd.sh to perl as directly as possible to allow for simple comparison. future changes can make it more perl-like. --- Linux [CREATE] vpx_scale_rtcd.h real 0m0.485s -> 0m0.022s [CREATE] vp8_rtcd.h real 0m4.619s -> 0m0.060s [CREATE] vp9_rtcd.h real 0m10.102s -> 0m0.087s Windows [CREATE] vpx_scale_rtcd.h real 0m8.360s -> 0m0.080s [CREATE] vp8_rtcd.h real 1m8.083s -> 0m0.160s [CREATE] vp9_rtcd.h real 2m6.489s -> 0m0.233s Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee	2014-03-03 14:47:11 -08:00

... 2 3 4 5 6 ...

304 Commits