generic-library/vpx

Author	SHA1	Message	Date
James Zern	587a71f1d6	rename vp9_dct32x32_sse2.c to vp9_dct32x32_sse2_impl.h this file shouldn't be built directly, it is included in vp9_dct_sse2.c to create a non-high-bitdepth and a high-bitdepth version silences missing prototype warnings for the unused FDCT32x32* functions Change-Id: I0e38f16dae5ea1728de184ee2c89287d48675c51	2015-05-15 16:59:52 -07:00
James Zern	4ec47249bc	rename vp9_dct32x32_avx2.c to vp9_dct32x32_avx2_impl.h this file shouldn't be built directly, it is included in vp9_dct_avx2.c to create a non-high-bitdepth and a high-bitdepth version silences missing prototype warnings for the unused FDCT32x32* functions Change-Id: I4c19935c0e035b393be513bde735e9a78064a494	2015-05-15 16:47:51 -07:00
James Zern	330fba41e2	vp9 intrinsics: add vp9_rtcd include silences a missing declaration warning Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1	2015-05-15 10:43:47 -07:00
James Zern	43d5cc7fe1	vp9_variance_sse2: sync function signatures + include vp9_rtcd.h silences missing prototype warnings Change-Id: I77902f07a454029baad4fe5fe6fc37c65644e6f7	2015-05-15 10:43:47 -07:00
James Zern	8515e62e6b	vp9_dct_sse2: make some functions static silences missing prototype warnings Change-Id: I773b6a6b5bd7c57db18c3b17c519534f80e131de	2015-05-15 10:43:47 -07:00
Johann	1d7ccd5325	Relocate memory operations for common code With the sad functions, and hopefully the variance functions soon, moving to the vpx_dsp location, place the defines used in the reference C code in a common location. Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca	2015-05-13 11:41:15 -07:00
James Zern	fd3658b0e4	replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNED this macro was used inconsistently and only differs in behavior from DECLARE_ALIGNED when an alignment attribute is unavailable. this macro is used with calls to assembly, while generic c-code doesn't rely on it, so in a c-only build without an alignment attribute the code will function as expected. Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79	2015-05-07 11:55:08 -07:00
Johann	d5d9289800	Move shared SAD code to vpx_dsp Create a new component, vpx_dsp, for code that can be shared between codecs. Move the SAD code into the component. This reduces the size of vpxenc/dec by 36k on x86_64 builds. Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a	2015-05-06 16:58:20 -07:00
James Zern	f58011ada5	vpx_mem: remove vpx_memset vestigial. replace instances with memset() which they already were being defined to. Change-Id: Ie030cfaaa3e890dd92cf1a995fcb1927ba175201	2015-04-28 20:00:59 -07:00
James Zern	f274c2199b	vpx_mem: remove vpx_memcpy vestigial. replace instances with memcpy() which they already were being defined to. Change-Id: Icfd1b0bc5d95b70efab91b9ae777ace1e81d2d7c	2015-04-28 19:59:41 -07:00
Marco Paniconi	f76ccce5bc	Revert "Revert "Force_split on 16x16 blocks in variance partition."" This reverts commit 004b9d83e37d355f590a6976a27b7b845d19a869 Change-Id: I2f2d0bdb9368c2c07f1d29a69cd461267a3a8743	2015-04-16 17:52:13 -07:00
Yunqing Wang	004b9d83e3	Revert "Force_split on 16x16 blocks in variance partition." This reverts commit eb8c667570aa83134c7db0690de9dbdde4d90291. The patch caused mismatch while using multi-threads. Change-Id: Icd646340af25b5d91e32f03ed3ea212e00e3e0be	2015-04-14 15:19:31 -07:00
Marco	eb8c667570	Force_split on 16x16 blocks in variance partition. Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks. Also increase variance threshold for 32x32, and add exit condiiton in choose_partition (with very safe threshold) based on sad used to select reference frame. Some visual improvement near moving boundaries. Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%. Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip. Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577	2015-04-13 12:05:07 -07:00
Jingning Han	93d9c50419	Merge "SSSE3 assembly implementation of 8x8 Hadamard transform"	2015-04-09 11:16:11 -07:00
Jingning Han	7f629dfca4	SSSE3 assembly implementation of 8x8 Hadamard transform It uses about 10% less CPU cycles than the SSE2 intrinsic implementation. Change-Id: I91017c0c068679a214b98cdd4cff3a6facfb7499	2015-04-04 09:59:37 -07:00
Jingning Han	30e9c091c0	Merge "Tune SSSE3 assembly implementation to improve quantization speed"	2015-04-03 11:24:28 -07:00
Jingning Han	2149f214d5	Merge "Reduce required xmm number by one in block_error_fp"	2015-04-01 15:46:22 -07:00
Jingning Han	657cabe0f7	Tune SSSE3 assembly implementation to improve quantization speed Change-Id: If0ca8b25b4800d4336e6cbc97194cd9b01c5b5a3	2015-04-01 15:28:01 -07:00
Jingning Han	cf4447339e	Merge "Optimize quantization simd implementation"	2015-04-01 14:55:18 -07:00
Jingning Han	f2cf3c06a0	Reduce required xmm number by one in block_error_fp Use 6 xmms instead of 8. Change-Id: If976ad85d09191d2fb0565399d690f2869dbbcc7	2015-04-01 12:07:35 -07:00
Jingning Han	1470529f62	Refactor block_yrd function for RTC coding mode This commit separates Hadamard transform/quantization operations from rate and distortion computation in block_yrd. This allows one to skip SATD computation when all transform blocks are quantized to zero. It also uses a new block error function that skips repeated computation of sum of squared residuals. It reduces the CPU cycles spent on block error calculation in block_yrd by 40%. Change-Id: I726acb2454b44af1c3bd95385abecac209959b10	2015-04-01 12:00:43 -07:00
Jingning Han	eed1badedd	Optimize quantization simd implementation This commit allows the quantizer to compare the AC coefficients to the quantization step size to determine if further multiplication operations are needed. It makes the quantization process 20% faster without coding statistics change. Change-Id: I735aaf6a9c0874c82175bb565b20e131464db64a	2015-04-01 11:47:09 -07:00
Jingning Han	014fa45298	Use aligned copy in 8x8 Hadamard transform SSE2 This reduces the 8x8 Hadamard transform cycles by 20%. Change-Id: If34c5e02f3afa42244c6efabe121f7cf5d2df41b	2015-03-31 10:21:52 -07:00
Jingning Han	34a996ac1e	Fix 8x8 Hadamard SSE2 implementation This commit fixes the SSE2 version 8x8 Hadamard transform alignment and makes it consistent with the C version. Change-Id: I1304e5f97e0e5ef2d798fe38081609c39f5bfe74	2015-03-30 15:54:08 -07:00
Jingning Han	26d3d3af6a	Enable 16x16 Hadamard transform in SATD based mode decision This commit replaces the 16x16 2D-DCT transform with Hadamard transform for RTC coding mode. It reduces the CPU cycles cost on 16x16 transform by 5X. Overall it makes the speed -6 encoding speed 1.5% faster without compromise on compression performance. Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b	2015-03-30 15:43:31 -07:00
Jingning Han	8c411f74e0	Hadamard transform based coding mode decision process This commit uses Hadamard transform based rate-distortion cost estimate for rtc coding mode decision. It improves the compression performance of speed -6 for many hard clips at lower bit-rates. For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for niklas720p. This will introduce extra encoding cycle costs at this point. Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375	2015-03-30 14:46:05 -07:00
James Zern	388add965f	vp9_fdct8x8_quant_ssse3: quiet a static analysis warning add an assert to validate 'in' array size Change-Id: Ie5a24275c066d9dd59714f6104510abbd4850dc5	2015-03-18 14:33:43 -07:00
James Zern	198b039e2a	vp9_fdct8x8_quant_sse2: quiet a static analysis warning add an assert to validate 'in' array size Change-Id: Ib72946a86f34e1ce8a69954e8e3e4fe1a0f18a91	2015-03-18 14:33:04 -07:00
Jingning Han	2cfddec332	Refactor column integral projection computation Move the scaling factor outside column projection. This avoids repeated calculation of the same scaling factor. Profiling shows that the percentage of vp9_int_pro_col_sse2 of overall cycles goes from 2.29% down to 1.88%. Change-Id: I5ac4e324ab2d7f33ba2de66dd2a12e04e04dfd66	2015-03-16 12:07:15 -07:00
Jingning Han	fcb96b3afd	Fix fdct8x8_quant ssse3 overflow issue This resolves webm issue 968. Change-Id: Ieb363129b1e135a561141c68211d413226aba754	2015-03-12 12:43:19 -07:00
Jingning Han	54eda13f8d	Apply fast motion search to golden reference frame This commit enables the rtc coding mode to run integral projection based motion search for golden reference frame. It improves the speed -6 compression performance by 1.1% on average, 3.46% for jimred_vga, 6.46% for tacomascmvvga, and 0.5% for vidyo clips. The speed -6 is about 6% slower. Change-Id: I0fe402ad2edf0149d0349ad304ab9b2abdf0c804	2015-03-11 16:03:49 -07:00
Jingning Han	a521008201	Scale the normalization factor depending on the block size Change-Id: I0a26994bf65ea224e496b09af2ce71e1a4210433	2015-03-03 11:29:46 -08:00
Jingning Han	1790d45252	Use variance metric for integral projection vector match This commit replaces the SAD with variance as metric for the integral projection vector match. It improves the search accuracy in the presence of slight light change. The average speed -6 compression performance for rtc set is improved by 1.7%. No speed changes are observed for the test clips. Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d	2015-03-01 10:42:56 -08:00
Jingning Han	73a00d3219	Refactor integral projection based motion estimation Support variable block size integral projection based motion estimation. Change-Id: Iee6d65e44df4480aa13fb7b84b9c91914b89caa1	2015-02-26 14:48:59 -08:00
Yunqing Wang	419ff1352e	Merge "Fix ssse3 quantize_fp functions while skip=1"	2015-02-25 10:10:10 -08:00
Jingning Han	e47033319d	Fix fwd transform sse2 build issue on older gcc version Change-Id: I3e0e53d129552babf29e6c5d047483733983973c	2015-02-24 23:25:21 -08:00
Yunqing Wang	58e0159c80	Fix ssse3 quantize_fp functions while skip=1 In ssse3 functions, DEFINE_ARGS macro hard codes qcoeff and dqcoeff to r3 and r4. If skip is 1, qcoeff and dqcoeff need to be loaded from the stack, which doesn't work because of the above definitions. Currently, skip=1 case is not used in the encoder. This patch fixed the issue, so it can be turned on later. Change-Id: I998d696b1a7a85dca2b3bcee790b21c21e039147	2015-02-24 10:37:05 -08:00
Jingning Han	ed2dc59c1b	Integral projection based motion estimation This commit introduces a new block match motion estimation using integral projection measurement. The 2-D block and the nearby region is projected onto the horizontal and vertical 1-D vectors, respectively. It then runs vector match, instead of block match, over the two separate 1-D vectors to locate the motion compensated reference block. This process is run per 64x64 block to align the reference before choosing partitioning in speed 6. The overall CPU cycle cost due to this additional 64x64 block match (SSE2 version) takes around 2% at low bit-rate rtc speed 6. When strong motion activities exist in the video sequence, it substantially improves the partition selection accuracy, thereby achieving better compression performance and lower CPU cycles. The experiments were tested in RTC speed -6 setting: cloud 1080p 500 kbps 17006 b/f, 37.086 dB, 5386 ms -> 16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster) pedestrian_area 1080p 500 kbps 53537 b/f, 36.771 dB, 18706 ms -> 51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings) blue_sky 1080p 500 kbps 70214 b/f, 33.600 dB, 13979 ms -> 53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster) jimred 400 kbps 13380 b/f, 36.014 dB, 5723 ms -> 13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower) Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c	2015-02-19 13:47:19 -08:00
Yunqing Wang	789ae447f8	Fix high bit depth assembly function bugs The high bit depth build failed while building for 32bit target. The bugs were in vp9_highbd_subpel_variance.asm and vp9_highbd_sad4d_sse2.asm functions. This patch fixed the bugs, and made 32bit build work. Change-Id: Idc8e5e1b7965bb70d4afba140c6583c5d9666b75	2015-02-05 11:24:03 -08:00
Yunqing Wang	10d5e09c87	Fix issues in 32bit PIC enabled build This patch was to fix issue 924: https://code.google.com/p/webm/issues/detail?id=924 The SECTION_RODATA macro was modified to support macho32 format. The sub-pixel functions were modified to pass in 2 more parameters to handle the global offsets for PIC build. Change-Id: I3bfcd336bcae945edf300bca4ab40376a2628cd4	2015-01-27 22:20:21 -08:00
Jingning Han	d0f2377027	Revert "Revert "Removal of legacy zbin_extra / zbin_oq_value."" This reverts commit 9946ee23e0a4c158e26a505b162a072f81b8a3be. Fix the ssse3 asm function. Change-Id: I07f77a63aa98087626e45c4e87aa5dcafc0b0b07	2014-12-22 10:09:25 -08:00
Paul Wilkins	9946ee23e0	Revert "Removal of legacy zbin_extra / zbin_oq_value." This reverts commit e9b586e21bb899e247346e82bccf5afb42604910. Change-Id: I5b36e6727da6c05278d97e2c37b80c109f79bed4	2014-12-19 15:02:58 +00:00
Paul Wilkins	e9b586e21b	Removal of legacy zbin_extra / zbin_oq_value. zbin extra / zbin_oq_value was widely passed around, hence removal touches a lot of code. Change-Id: Idc94359735b60c38a160e4385ae09d5ca8b6b8e5	2014-12-18 16:49:11 +00:00
James Zern	c38d0490b3	Merge "Changes to assembler for NASM on mac."	2014-12-08 12:55:06 -08:00
Deb Mukherjee	6615706af2	sse2 visual studio build fix Change-Id: Id8c8c3be882bcd92afea3ccec6ebdf3f208d28ef	2014-12-03 16:35:26 -08:00
Marco	8fd3f9a2fb	Enable non-rd mode coding on key frame, for speed 6. For key frame at speed 6: enable the non-rd mode selection in speed setting and use the (non-rd) variance_based partition. Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames), mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16. Loss in key frame quality (~0.6-0.7dB) compared to rd coding, but speeds up key frame encoding by at least 6x. Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6. Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405	2014-12-03 09:18:08 -08:00
Peter de Rivaz	7e40a55ef9	Added high bitdepth sse2 transform functions Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit 005d80cd05269a299cd2f7ddbc3d4d8b791aebba) (cherry picked from commit 08d2f548007fd8d6fd41da8ef7fdb488b6485af3) (cherry picked from commit 4230c2306c194c058f56433a5275aa02a2e71d56)	2014-12-02 11:16:24 -08:00
John Stark	71379b87df	Changes to assembler for NASM on mac. fixes non-Apple nasm part of issue #755 Change-Id: I11955d270c4ee55e3c00e99f568de01b95e7ea9a	2014-11-24 12:00:50 -08:00
Debargha Mukherjee	02355a4abf	Merge "Added highbitdepth sse2 acceleration for quantize"	2014-11-21 16:08:47 -08:00
Peter de Rivaz	a7b2d09f36	Added highbitdepth sse2 acceleration for quantize Also includes block error. (This patch is mostly cherry picked from commit db7192e0b014a331a1dcb102c8a1148e9f0e1081) Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78	2014-11-19 23:55:19 -08:00

1 2 3 4 5

245 Commits