vpx/vpx_dsp/x86
Geza Lore 9cfba09ac0 Optimize vpx_quantize_{b,b_32x32} assembler.
Added optimization of the 8 bit assembly quantizer routines. This makes
these functions up to 100% faster, depending on encoding parameters.

This patch maskes the encoder faster in both the high bitdepth and 8bit
configurations. In the high bitdepth configuration, it effects profile 0
only.

Based on my profiling using 1080p input the net gain is between 1-3% for
the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
depending on target bitrate. The difference between the 8 bit and high
bitdepth configurations for the same encoder run is reduced by 1% in all
cases I have profiled.

Change-Id: I86714a6b7364da20cd468cd784247009663a5140
2015-10-20 10:11:19 +01:00
..
convolve.h VPX: remove step == 16 and filter[3] != 128 checks 2015-08-10 13:44:32 -07:00
fwd_dct32x32_impl_avx2.h Factor 32x32 fwd DCT to vpx_dsp folder 2015-07-28 11:13:41 -07:00
fwd_dct32x32_impl_sse2.h Replace vp9_ prefix in 2D-DCT functions with vpx_ 2015-07-28 16:06:44 -07:00
fwd_txfm_avx2.c Replace vp9_ prefix in 2D-DCT functions with vpx_ 2015-07-28 16:06:44 -07:00
fwd_txfm_impl_sse2.h Replace vp9_ prefix in 2D-DCT functions with vpx_ 2015-07-28 16:06:44 -07:00
fwd_txfm_sse2.c Replace vp9_ prefix in 2D-DCT functions with vpx_ 2015-07-28 16:06:44 -07:00
fwd_txfm_sse2.h Move forward dct sse2 header file to vpx_dsp 2015-07-27 14:59:57 -07:00
fwd_txfm_ssse3_x86_64.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
halfpix_variance_impl_sse2.asm Add sse2 versions of halfpix variance 2015-08-27 11:58:38 -07:00
halfpix_variance_sse2.c Add sse2 versions of halfpix variance 2015-08-27 11:58:38 -07:00
highbd_intrapred_sse2.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
highbd_loopfilter_sse2.c Rename loop filter function from vp9_ to vpx_ 2015-07-17 15:55:02 -07:00
highbd_quantize_intrin_sse2.c Change vp9_quantize to vpx_quantize 2015-08-04 15:31:49 -07:00
highbd_sad4d_sse2.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
highbd_sad_sse2.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
highbd_subpel_variance_impl_sse2.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
highbd_variance_impl_sse2.asm Move variance functions to vpx_dsp 2015-05-26 12:01:52 -07:00
highbd_variance_sse2.c Clean out more MSVC warnings 2015-07-08 15:09:20 -07:00
intrapred_sse2.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
intrapred_ssse3.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
inv_txfm_sse2.c VPX: refactor vpx_idct32x32_1_add_sse2() 2015-10-05 06:33:42 -07:00
inv_txfm_sse2.h Accelerated transform in high bit depth 2015-09-28 21:09:16 -07:00
inv_txfm_ssse3_x86_64.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
inv_wht_sse2.asm Rename inv_txfm_sse2.asm to inv_wht_sse2.asm 2015-08-19 10:29:53 -07:00
loopfilter_avx2.c Rename loop filter function from vp9_ to vpx_ 2015-07-17 15:55:02 -07:00
loopfilter_mmx.asm Rename loop filter function from vp9_ to vpx_ 2015-07-17 15:55:02 -07:00
loopfilter_sse2.c Rename loop filter function from vp9_ to vpx_ 2015-07-17 15:55:02 -07:00
quantize_avx_x86_64.asm Optimize vpx_quantize_{b,b_32x32} assembler. 2015-10-20 10:11:19 +01:00
quantize_sse2.c SSE2 optimisation for quantize in high bit depth 2015-10-05 10:59:16 -07:00
quantize_ssse3_x86_64.asm Remove 4 mova insts from quantize_ssse3_x86_64.asm 2015-10-09 07:52:04 -07:00
sad4d_avx2.c sad*_avx2.c: sync function signatures 2015-05-14 20:58:56 -07:00
sad4d_sse2.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
sad_avx2.c sad*_avx2.c: sync function signatures 2015-05-14 20:58:56 -07:00
sad_mmx.asm Move shared SAD code to vpx_dsp 2015-05-06 16:58:20 -07:00
sad_sse2.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
sad_sse3.asm Move shared SAD code to vpx_dsp 2015-05-06 16:58:20 -07:00
sad_sse4.asm Move shared SAD code to vpx_dsp 2015-05-06 16:58:20 -07:00
sad_ssse3.asm Move shared SAD code to vpx_dsp 2015-05-06 16:58:20 -07:00
ssim_opt_x86_64.asm ssim: Replace unsigned long with uint32_t. 2015-08-07 11:48:31 -07:00
subpel_variance_sse2.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
subtract_sse2.asm Use newer x86inc.asm 2015-08-07 16:44:44 -07:00
txfm_common_sse2.h Refactor vp9_idct.h file 2015-07-26 08:26:32 -07:00
variance_avx2.c Move sub pixel variance to vpx_dsp 2015-07-07 15:51:04 -07:00
variance_impl_avx2.c Move sub pixel variance to vpx_dsp 2015-07-07 15:51:04 -07:00
variance_impl_mmx.asm Move sub pixel variance to vpx_dsp 2015-07-07 15:51:04 -07:00
variance_mmx.c Move sub pixel variance to vpx_dsp 2015-07-07 15:51:04 -07:00
variance_sse2.c Move sub pixel variance to vpx_dsp 2015-07-07 15:51:04 -07:00
vpx_asm_stubs.c Code refactor on InterpKernel 2015-07-31 10:27:33 -07:00
vpx_convolve_copy_sse2.asm Add vpx_highbd_convolve_{copy,avg}_sse2 2015-10-09 11:50:25 -07:00
vpx_high_subpixel_8t_sse2.asm Code refactor on InterpKernel 2015-07-31 10:27:33 -07:00
vpx_high_subpixel_bilinear_sse2.asm Code refactor on InterpKernel 2015-07-31 10:27:33 -07:00
vpx_subpixel_8t_intrin_avx2.c Upstream Mozilla fix for older Apple clang builds 2015-10-14 07:41:23 -07:00
vpx_subpixel_8t_intrin_ssse3.c Remove vpx_filter_block1d16_v8_intrin_ssse3 2015-09-18 16:05:43 -07:00
vpx_subpixel_8t_sse2.asm Code refactor on InterpKernel 2015-07-31 10:27:33 -07:00
vpx_subpixel_8t_ssse3.asm vpx_subpixel_8t_ssse3: fix reg counts/access 2015-09-17 12:27:34 -07:00
vpx_subpixel_bilinear_sse2.asm Code refactor on InterpKernel 2015-07-31 10:27:33 -07:00
vpx_subpixel_bilinear_ssse3.asm Code refactor on InterpKernel 2015-07-31 10:27:33 -07:00