generic-library/vpx

Author	SHA1	Message	Date
James Zern	807248ec81	Merge "ppc: Add vpx_idct32x32_1024_add_vsx"	2017-10-07 19:08:26 +00:00
Linfeng Zhang	127864deb3	Generalize 2:1 vp9_scale_and_extend_frame_ssse3() Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5	2017-10-04 12:35:39 -07:00
Linfeng Zhang	9a71811d98	Merge changes Id6a8c549,Ib1e0650b,Ic369dd86 * changes: Refactor x86/vpx_subpixel_8t_intrin_ssse3.c Add vpx_dsp/x86/mem_sse2.h Add transpose_8bit_{4x4,8x8}() x86 optimization	2017-10-04 16:15:14 +00:00
James Zern	66b6b87471	Merge "vpx: fix nasm build errors"	2017-10-03 21:47:49 +00:00
Scott LaVarnway	bc4bc9b622	vpx: fix nasm build errors BUG=webm:1462,766721 Change-Id: Icfa536a8e38623636b96c396e3c94889bfde7a98	2017-10-03 20:02:21 +00:00
Linfeng Zhang	6543213e87	Refactor x86/vpx_subpixel_8t_intrin_ssse3.c Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac	2017-10-03 13:02:05 -07:00
Linfeng Zhang	0f756a307d	Add vpx_dsp/x86/mem_sse2.h Add some load and store sse2 inline functions. Change-Id: Ib1e0650b5a3d8e2b3736ab7c7642d6e384354222	2017-10-03 12:59:05 -07:00
Linfeng Zhang	67c38c92e7	Add transpose_8bit_{4x4,8x8}() x86 optimization Change-Id: Ic369dd86b3b81686f68fbc13ad34ab8ea8846878	2017-10-03 10:00:30 -07:00
Alexandra Hájková	fb7fc1dbda	ppc: Add vpx_idct32x32_1024_add_vsx Change-Id: I55cd0a1569ccc47a53d0ecf751aac259d510e10d	2017-09-30 19:31:20 +00:00
Scott LaVarnway	3bbd62ed27	vpxdsp: [x86] add highbd_d135_predictor functions C vs SSE2 speed gains: _4x4 : ~1.81x C vs SSSE3 speed gains: _8x8 : ~1.96x _16x16 : ~1.88x _32x32 : ~2.02x BUG=webm:1411 Change-Id: Iefaf8b39afbbfe34c1ad1d21e3a003b20f1f61e0	2017-09-29 08:56:38 -07:00
Scott LaVarnway	4cae64c32c	vpxdsp: [x86] add highbd_d117_predictor functions C vs SSE2 speed gains: _4x4 : ~2.04x C vs SSSE3 speed gains: _8x8 : ~2.82x _16x16 : ~5.93x _32x32 : ~2.79x BUG=webm:1411 Change-Id: I31d949695991c067dac89d91e0bed3e666c94993	2017-09-28 14:45:28 -07:00
Scott LaVarnway	80992a746c	Merge "vpxdsp: [x86] add highbd_d153_predictor functions"	2017-09-27 20:40:21 +00:00
James Zern	690fa6bb6e	Merge "fix signed integer overflow of idct"	2017-09-27 19:39:11 +00:00
Linfeng Zhang	dbbbd44304	fix signed integer overflow of idct Exposed by fuzz test in high bitdepth. The bug is introduced in commit `64653fa`. BUG=webm:1466 Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5	2017-09-27 11:17:54 -07:00
Scott LaVarnway	19c45ccd43	vpxdsp: [x86] add highbd_d153_predictor functions C vs SSE2 speed gains: _4x4 : ~1.95x C vs SSSE3 speed gains: _8x8 : ~3.30x _16x16 : ~5.67x _32x32 : ~3.87x BUG=webm:1411 Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904	2017-09-27 11:01:11 -07:00
Linfeng Zhang	9d0d13e939	Add vpx_scaled_2d_neon() BUG=webm:1419 Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96	2017-09-26 09:22:39 -07:00
Linfeng Zhang	28762341ac	Merge changes Ib9105462,Idfac00ed,If8d8a0e2 * changes: cosmetics: NEON scaling code Refactor convolve NEON code Refactor convolve code	2017-09-26 16:10:46 +00:00
Scott LaVarnway	a059dc0986	Merge "vpxdsp: [x86] add highbd_d45_predictor functions"	2017-09-25 11:34:14 +00:00
Scott LaVarnway	cf82f7276e	vpxdsp: [x86] add highbd_d45_predictor functions C vs SSSE3 speed gains: _4x4 : ~2.45x _8x8 : ~10.61x _16x16 : ~11.34x _32x32 : ~6.36x BUG=webm:1411 Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09	2017-09-22 05:20:12 -07:00
Linfeng Zhang	d586cdb4d4	Remove the unnecessary cast of (int16_t)cospi_{1...31}_64 BUG=webm:1450 Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8	2017-09-20 14:13:26 -07:00
Linfeng Zhang	76a3d3fcc5	Remove the unnecessary upcasts of (int)cospi_{1...31}_64 BUG=webm:1450 Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858	2017-09-20 14:13:26 -07:00
Linfeng Zhang	64653fa133	Change cospi_{1...31}_64 from tran_high_t to tran_coef_t The unnecessary upcast to (int) will be cleaned later. BUG=webm:1450 Change-Id: Ia234575206d5a74540526924b06ed3939322d063	2017-09-20 14:13:26 -07:00
Scott LaVarnway	b85e391ac8	Merge "vpxdsp: [x86] add highbd_d63_predictor functions"	2017-09-20 11:39:28 +00:00
Linfeng Zhang	7c0529728a	cosmetics: NEON scaling code Change-Id: Ib91054622c1f09c4ca523bc6837d7d8ab9f03618	2017-09-19 16:39:17 -07:00
Linfeng Zhang	f357335c38	Refactor convolve NEON code Rename a couple of hbd static functions. Move the position of NEON function convolve8_4(). Change-Id: Idfac00edf2e99cdd8e0a73b9f895402f60be6349	2017-09-19 16:28:36 -07:00
Linfeng Zhang	bf8bdae913	Refactor convolve code Extract a couple of static functions into their caller functions. Change-Id: If8d8a0e217fba6b402d2a79ede13b5b444ff08a0	2017-09-19 16:28:31 -07:00
Scott LaVarnway	bc86e2c6a2	vpxdsp: [x86] add highbd_d63_predictor functions C vs SSE2 speed gains: _4x4 : ~2.94x C vs SSSE3 speed gains: _8x8 : ~8.69x _16x16 : ~6.32x _32x32 : ~5.33x BUG=webm:1411 Change-Id: I2c35b527eac2229f17aaa9d118fb601e7195efe4	2017-09-19 15:47:22 -07:00
Linfeng Zhang	a80bdfd081	Change sinpi_{1,2,3,4}_9 from tran_high_t to int16_t Add "typedef int16_t tran_coef_t;" BUG=webm:1450 Change-Id: I67866f104898d1dda8989e1abdaf6983fe324154	2017-09-18 09:26:03 -07:00
Linfeng Zhang	9d278465b5	Merge "cosmetics: vp9_rtcd_defs.pl"	2017-09-18 16:23:33 +00:00
Kaustubh Raste	4ca8f8f5e2	mips msa clean-up msa macros Removed inline for GP load-store in case of (__mips_isa_rev >= 6) Created one define LD_V for vector load and ST_V for vector store Change-Id: Ifec3570fa18346e39791b0dd622892e5c18bd448	2017-09-14 12:29:19 +05:30
Linfeng Zhang	535dee0fb6	cosmetics: vp9_rtcd_defs.pl Change-Id: I1bf57824e07fa4f8b3b5574984117f2bd7a1c086	2017-09-13 12:13:55 -07:00
Johann Koenig	ed3a80cb5e	Merge "Revert "Revert "quantize avx: copy 32x32 implementation"""	2017-09-13 14:44:53 +00:00
Johann	eb4238ac70	Revert "Revert "quantize avx: copy 32x32 implementation"" This reverts commit `8c42237bb2`. Because ssse3 code is used for the reference, the qcoeff and dqcoeff reference buffers must be aligned. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06	2017-09-12 14:25:38 -07:00
Kaustubh Raste	30f1ff94e0	Optimize mips msa vp9 average mc functions Load the specific destination loads instead of vector load Change-Id: I65ca13ae8f608fad07121fef848e2a18f54171fe	2017-09-12 16:12:11 +05:30
Scott LaVarnway	c39cd9235e	Merge "vpxdsp: [x86] add highbd_d207_predictor functions"	2017-09-11 22:32:23 +00:00
Linfeng Zhang	a9bbe53dbb	Add 4 to 1 scaling NEON optimization BUG=webm:1419 Change-Id: If82a93935d2453e61b7647aae70983db1740bec7	2017-09-11 10:17:28 -07:00
Scott LaVarnway	d6c9bbc2b6	vpxdsp: [x86] add highbd_d207_predictor functions C vs SSE2 speed gains: _4x4 : ~2.31x C vs SSSE3 speed gains: _8x8 : ~4.73x _16x16 : ~10.88x _32x32 : ~4.80x BUG=webm:1411 Change-Id: I0bac29db261079181ddabc6814bd62c463109caf	2017-09-11 07:36:24 -07:00
James Zern	fb40b5d7a7	intrapred: sync highbd_d63_predictor w/d63_ 8/16/32: ~6%/~18%/~33% faster previously: `7012ba639` vp9_reconintra: simplify d63_predictor BUG=webm:1411 Change-Id: Ie775f3a4f7fd74df44754e65686d826a51c2cdc2	2017-09-08 19:28:01 -07:00
James Zern	5c95fd921e	intrapred: sync highbd_d45_predictor w/d45_ 8/16/32:: ~19%/~54%/~75.5% faster previously: `acc481eaa` vp9_reconintra: simplify d45_predictor BUG=webm:1411 Change-Id: Ie8340b0c5070ae640f124733f025e4e749b660d8	2017-09-08 19:09:07 -07:00
James Zern	9a2dd7e67e	Merge changes I9ec438aa,I99c954ff * changes: Update convolve functions' assertions Add 2 to 1 scaling NEON optimization	2017-09-08 19:23:40 +00:00
Shiyou Yin	2c7b7424c5	Merge "vpxdsp: [loongson] optimize sad functions with mmi"	2017-09-08 00:55:14 +00:00
Linfeng Zhang	ef41c6286d	Update convolve functions' assertions So that 4 to 1 frame scaling can call them. Change-Id: I9ec438aa63b923ba164ad3c59d7ecfa12789eab5	2017-09-07 12:33:58 -07:00
Linfeng Zhang	3ec20445b2	Refactor convolve8 NEON functions Change-Id: I4ac576875c91fee7cb150d298fae4a2c156d374c	2017-09-06 15:55:17 -07:00
Linfeng Zhang	7219f31904	Merge "Remove get_filter_base() and get_filter_offset() in convolve"	2017-09-06 22:39:15 +00:00
Linfeng Zhang	d331e7a1c0	Remove get_filter_base() and get_filter_offset() in convolve so that the convolve functions are independent of table alignment. Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee	2017-09-05 15:22:36 -07:00
Scott LaVarnway	bc4bcca3fd	vpxdsp: [x86] add highbd_dc_128_predictor functions C vs SSE2 speed gains: _4x4 : ~7.64x _8x8 : ~16.60x _16x16 : ~8.15x _32x32 : ~5.05x BUG=webm:1411 Change-Id: If165d419711cfda901bd428a05ca1560a009e62e	2017-09-05 07:57:42 -07:00
Shiyou Yin	f4150163a2	vpxdsp: [loongson] optimize sad functions with mmi 1. vpx_sadWxH_c 2. vpx_sadWxH_avg_c 3. vpx_sadWxHx3_c 4. vpx_sadWxHx8_c 5. vpx_sadWxHx4d_c Change-Id: Ie13161e3d73a052ea6ea7bac9cfadf55598fea7a	2017-09-02 15:11:32 +00:00
James Zern	334e9abb0b	Merge "inv_txfm_vsx: fix loads in high-bitdepth"	2017-09-01 03:09:49 +00:00
James Zern	f8f64c309b	inv_txfm_vsx: fix loads in high-bitdepth vec_vsx_ld -> load_tran_low Change-Id: Id3144cdd528d2d406a515e5812e2ea9e4db64bf1	2017-08-30 23:47:56 -07:00
Scott LaVarnway	c39a05ff61	vpxdsp: [x86] add highbd_dc_left_predictor functions C vs SSE2 speed gains: _4x4 : ~6.49x _8x8 : ~10.82x _16x16 : ~7.61x _32x32 : ~5.29x BUG=webm:1411 Change-Id: Ibc30c50cb7139049bf05298010803499e6ef949b	2017-08-30 09:29:06 -07:00
Scott LaVarnway	f783e3a75d	vpxdsp: [x86] add highbd_dc_top_predictor functions C vs SSE2 speed gains: _4x4 : ~7.39x _8x8 : ~11.36x _16x16 : ~8.68x _32x32 : ~4.33x BUG=webm:1411 Change-Id: I7f1487cd1531d4e7f0fbb4596fed3bfb72a59d58	2017-08-29 12:53:30 -07:00
Scott LaVarnway	30d9a1916c	vpxdsp: [x86] add highbd_h_predictor functions C vs SSE2 speed gains: _4x4 : ~8.12x _8x8 : ~9.71x _16x16 : ~8.21x _32x32 : ~5.0x BUG=webm:1422 Change-Id: I5e8a1ed4db7b8dc539b3e2a728b0b34d8b4b1993	2017-08-28 17:31:18 -07:00
Marco Paniconi	3e069846b9	Merge "Revert "quantize avx: copy 32x32 implementation""	2017-08-25 18:20:31 +00:00
Marco Paniconi	8c42237bb2	Revert "quantize avx: copy 32x32 implementation" This reverts commit `f60d1dcd3d`. Reason for revert: <INSERT REASONING HERE> Failures in AVX/VP9QuantizeTest in nightly tests. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org Change-Id: Ibd38636212269328317dd0721be9d25452113d1c No-Presubmit: true No-Tree-Checks: true No-Try: true	2017-08-25 16:56:08 +00:00
Shiyou Yin	ece1989fa2	Merge "vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi."	2017-08-25 06:44:02 +00:00
Shiyou Yin	9e4647c7ab	vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi. Change-Id: Ia576a721df6312329b599c31cfe1fb1267a9f174	2017-08-25 01:58:49 +08:00
Johann	f60d1dcd3d	quantize avx: copy 32x32 implementation Ensure avx and ssse3 stay in sync by testing them against each other. Change-Id: I699f3b48785c83260825402d7826231f475f697c	2017-08-24 10:42:34 -07:00
Johann	1787e7dbe0	quantize ssse3: copy implementation to intrinsics Still does not pass tests. Does match the previous assembly, although saving the sign before multiplying is dubious. Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a	2017-08-24 07:47:51 -07:00
Shiyou Yin	d080c92524	Merge "vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi."	2017-08-24 00:55:11 +00:00
Johann Koenig	f53b656207	Merge "quantize avx: copy implementation to intrinsics"	2017-08-23 21:14:13 +00:00
Scott LaVarnway	1aad50c092	Merge "vpx_dsp: get32x32var_avx2() cleanup"	2017-08-23 19:59:25 +00:00
Johann Koenig	dfafd10ef5	Merge "quantize neon: round dqcoeff towards zero"	2017-08-23 19:20:53 +00:00
Johann	7c27872164	quantize avx: copy implementation to intrinsics Adds an early exit based on ptest. Slightly slower than ssse3 in the full case because of the extra check, but potentially faster if lots of rows can be skipped. Very close in speed to the assembly. Can run in 32 bit, unlike the assembly. Allows reworking the function prototype to use structs. Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e	2017-08-23 09:19:16 -07:00
Johann	2a5aa98a35	quantize neon: round dqcoeff towards zero Add 1 if negative to get dqcoeff to round towards zero. 10-15% faster than converting to positive before shifting. Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d	2017-08-23 08:05:50 -07:00
Shiyou Yin	59e065b6ed	vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi. Change-Id: I2c782d18d9004414ba61b77238e0caf3e022d8f2	2017-08-23 15:14:15 +08:00
Johann	b9c1dcc5fa	quantize ssse3: copy style from sse2 Change-Id: I53f8a160e640c674ea035fc112e207b6dca42598	2017-08-22 14:25:27 -07:00
Johann	75752ab7c0	quantize sse2: copy opts from ssse3 Simplify eob calculations based on ssse3 implementation. General clean up and re-scoping. Change-Id: I48f282bf9bd28ee9bc2c7a6779be9d45b5a3a3ee	2017-08-22 13:01:44 -07:00
Johann Koenig	ab27b68693	Merge changes Icfb70687,I9a963e99,Ie8ac00ef,I1272917c * changes: quantize: ignore skip_block in arm quantize: ignore skip_block in x86 quantize fp: ignore skip_block in arm quantize fp: ignore skip_block in x86	2017-08-22 19:19:14 +00:00
James Zern	419ce36294	Merge "ppc: Add vpx_idct16x16_256_add_vsx"	2017-08-22 00:48:39 +00:00
Shiyou Yin	bff5aa9827	Merge "vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi."	2017-08-22 00:37:23 +00:00
Johann	2c56bb97f2	quantize: ignore skip_block in arm Change-Id: Icfb70687476b2edb25d255793ba325b261d40584	2017-08-21 14:37:50 -07:00
Johann	c02fdd0258	quantize: ignore skip_block in x86 Change-Id: I9a963e99f08761f0c8d6a305619270b2f1c4edf8	2017-08-21 14:37:03 -07:00
Johann	13eed991f9	Remove skip_block from quantize This condition is handled before this code is reached. The ssse3 version of the function has always crashed when attempting to handle the skip_block condition. Add assert() and comments regarding the usage of skip_block. Removing the parameter is a fairly involved process so leave it be for the moment. Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a	2017-08-21 09:49:04 -07:00
Scott LaVarnway	eab3f5e0cc	vpx_dsp: get32x32var_avx2() cleanup renamed to get32x16var_avx2() BUG=webm:1404 Change-Id: Icb8f3986c9c9c646e13a69430db7235fc7e1a036	2017-08-18 13:44:09 -07:00
Scott LaVarnway	2c5478e383	Merge "vpx_dsp: vpx_get16x16var_avx2() cleanup"	2017-08-18 20:30:59 +00:00
Scott LaVarnway	2f7497f341	vpx_dsp: vpx_get16x16var_avx2() cleanup BUG=webm:1404 Change-Id: I88aceb07f4db4870a06eee21d87296974ce3221a	2017-08-18 12:23:49 -07:00
Johann Koenig	1426f04e91	Merge "quantize: normalize intermediate types"	2017-08-18 16:00:28 +00:00
Shiyou Yin	7d82e57f5b	vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi. Change-Id: Ia120ad1064d0b6106d9685cf075bdab373eef19e	2017-08-18 09:06:49 +08:00
James Zern	bb15fd51be	highbd_idct32x32*,idct32_34_4x32_quarter_1_2: fix typo 135 -> 34 fixes unused function warnings for highbd_idct32_34_4x32_quarter_[12] Change-Id: I4f50ff6ea514200af93dd59ff94c7f9717409682	2017-08-17 15:37:38 -07:00
Johann	7f602d6114	quantize: normalize intermediate types Despite abs_coeff being a positive value, all the other implementations treat it as signed which simplifies restoring the sign. HBD builds cast qcoeff to avoid a visual studio warning. Match vp9_quantize.c style of casting the entire expression. Change-Id: I62b539b8df05364df3d7644311e325288da7c5b5	2017-08-17 12:34:28 -07:00
James Zern	e038d1610e	inv_txfm_sse2.h: correct idct/iadst prototypes fixes mismatch between prototypes and definitions Change-Id: Ib5e7dfcce244dbb8401815be2cdd183d96792652	2017-08-16 23:06:09 -07:00
Linfeng Zhang	f95686895b	Merge changes I08b562b6,Ia275940a,I51106e90 * changes: Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1} Update highbd idct x86 optimizations. Update 32x32 idct sse2 and ssse3 optimizations.	2017-08-16 16:36:37 +00:00
Jerome Jiang	6b9c691daf	Merge "Clean up writing YUV files for debug purpose."	2017-08-15 18:28:54 +00:00
Jerome Jiang	a153080b55	Clean up writing YUV files for debug purpose. Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files. Delete some flags that can be enabled during build. To enable writing denoised YUV, use the following command line: CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure --enable-vp9-temporal-denoising For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP' Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528	2017-08-15 10:44:03 -07:00
Johann	77ed4414d6	quantize: silence unsigned overflow warning The result of the xor operation is unsigned. If coeff was negative, this results in an unsigned value - INT_MIN. Change-Id: I1f1edeaa6de1f4c68b848e8a82a666d390b749f0	2017-08-15 09:48:24 -07:00
Linfeng Zhang	d72e20b123	Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1} BUG=webm:1412 Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f	2017-08-14 17:05:22 -07:00
Linfeng Zhang	69775d2f40	Update highbd idct x86 optimizations. BUG=webm:1412 Change-Id: Ia275940af7d7d8637e9a851a9e39d655bfbe4069	2017-08-14 16:59:50 -07:00
Linfeng Zhang	3f05a70c41	Update 32x32 idct sse2 and ssse3 optimizations. Change-Id: I51106e90344035452621c49a6e1be7d5276b6c70	2017-08-14 16:59:31 -07:00
Linfeng Zhang	15193ce51f	Merge "Clean highbd idct x86 code with inline functions"	2017-08-10 20:25:18 +00:00
Johann Koenig	9bb8ce5efb	Merge "neon: vpx_quantize_b_32x32"	2017-08-10 15:42:49 +00:00
Johann Koenig	0b393ae505	Merge "quantize: copy ssse3 optimizations to intrinsics"	2017-08-10 15:42:20 +00:00
Linfeng Zhang	39da7fb786	Clean highbd idct x86 code with inline functions Created inline functions highbd_butterfly_cospi16_sse2() and highbd_butterfly_cospi16_sse4_1() BUG=webm:1412 Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a	2017-08-08 17:53:28 -07:00
Johann	93166c5e51	neon: vpx_quantize_b_32x32 With skip block the neon is about twice as fast as C. The neon has no shortcut for coeff < zbin so it always takes the same amount of time. Even if the C can take the shortcut, it is over twice as fast in neon. If it can't, that gap increases to over 10x. BUG=webm:1426 Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6	2017-08-08 14:05:18 -07:00
Johann	d52cb59729	quantize: copy ssse3 optimizations to intrinsics Fairly minor differences from sse2. pabsw and psignw are the big gains. Also re-uses some values in eob calculation to avoid an extra pcmp. Fixes test failures in HBD and OS X builds. Allows using it in 32bit builds, where it is about 40% faster than sse2. Substantially faster than the assembly for skip_block. 10-20% faster the rest of the time. Change-Id: If783bb3567e561e47667e10133b9c84414a334e2	2017-08-08 12:22:14 -07:00
Linfeng Zhang	853165ba39	Update 32x32 idct sse2 funcs, add partial case 135 Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a	2017-08-07 17:37:02 -07:00
Linfeng Zhang	d670678f26	Rename highbd_multiplication_and_add_xx() to highbd_butterfly_xx() in idct x86 code Change-Id: I5159499a73a5c1b680516f6ca9c3d84f00c35083	2017-08-04 15:33:37 -07:00
Linfeng Zhang	fa829e0e5a	Replace multiplication_and_add() with butterfly() in idct x86 code Change-Id: I266e45a3d75a5357c7d6e6f20ab5c6fdbfe4982e	2017-08-04 15:33:34 -07:00
Linfeng Zhang	c9fb719ee1	Update butterfly() in idct x86 optimizations. Change-Id: Ic73e03bab9fdc085146f52094014db4af36ad701	2017-08-04 15:33:28 -07:00
Linfeng Zhang	7f20c3ac44	Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1 BUG=webm:1412 Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca	2017-08-04 15:31:17 -07:00
Linfeng Zhang	22b6dc9fdf	Update for loop increment of idct x86 functions Change-Id: Ided7895eaf41d5bc9d64fe536a17f5a078da68d4	2017-08-04 15:29:19 -07:00

1 2 3 4 5 ...

979 Commits