3 Commits

Author SHA1 Message Date
James Almer
d2a7314f1e vp9/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2().
Similar gains in performance as the SSSE3 version

Signed-off-by: James Almer <jamrial@gmail.com>
2014-01-17 14:16:38 +01:00
Clément Bœsch
8b4190da93 vp9/x86: add AVX for itxfm and lpf.
4412 decicycles in ff_vp9_loop_filter_h_16_16_ssse3, 4193462 runs, 842 skips
3600 decicycles in ff_vp9_loop_filter_h_16_16_avx, 4193621 runs, 683 skips

3010 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 4193528 runs, 776 skips
2678 decicycles in ff_vp9_loop_filter_v_16_16_avx, 4193742 runs, 562 skips

23025 decicycles in ff_vp9_idct_idct_32x32_add_ssse3, 2096871 runs, 281 skips
19943 decicycles in ff_vp9_idct_idct_32x32_add_avx, 2096815 runs, 337 skips

4675 decicycles in ff_vp9_idct_idct_16x16_add_ssse3, 4194018 runs, 286 skips
3980 decicycles in ff_vp9_idct_idct_16x16_add_avx, 4194022 runs, 282 skips

967 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 16776972 runs, 244 skips
887 decicycles in ff_vp9_idct_idct_8x8_add_avx, 16777002 runs, 214 skips
2014-01-15 15:54:03 +01:00
Clément Bœsch
af68bd1c06 vp9/x86: add ff_vp9_loop_filter_[vh]_16_16_ssse3().
16662 decicycles in loop_filter_h_16_16_c, 8387355 runs, 1253 skips
17510 decicycles in loop_filter_v_16_16_c, 8387516 runs, 1092 skips

4941 decicycles in ff_vp9_loop_filter_h_16_16_ssse3, 8387887 runs, 721 skips
3899 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 8387980 runs, 628 skips

Overall decode time goes from:
  ./ffmpeg -v 0 -nostats -threads 1 -i ~/samples/vp9/ped1080p.webm -f null -  8.10s user 0.02s system 99% cpu 8.126 total
to:
  ./ffmpeg -v 0 -nostats -threads 1 -i ~/samples/vp9/ped1080p.webm -f null -  6.15s user 0.04s system 99% cpu 6.199 total

(46 to 61 fps)
2014-01-12 20:20:24 +01:00