Commit Graph

4 Commits

Author SHA1 Message Date
Ganesh Ajjanagadde
68a0a164d1 avfilter/vf_removegrain: replace qsort with AV_QSORT
filter_slice calls qsort, so qsort is in a performance critical
position. AV_QSORT is substantially faster due to the inlining of the
comparison callback. Thus, the increase in performance is worth the
increase in binary size.

Sample benchmark (x86-64, Haswell, GNU/Linux),
filter-removegrain-mode-02 (from FATE)
new:
  24060 decicycles in qsort,       1 runs,      0 skips
  15690 decicycles in qsort,       2 runs,      0 skips
   9307 decicycles in qsort,       4 runs,      0 skips
   5572 decicycles in qsort,       8 runs,      0 skips
   3485 decicycles in qsort,      16 runs,      0 skips
   2517 decicycles in qsort,      32 runs,      0 skips
   1979 decicycles in qsort,      64 runs,      0 skips
   1911 decicycles in qsort,     128 runs,      0 skips
   1568 decicycles in qsort,     256 runs,      0 skips
   1596 decicycles in qsort,     512 runs,      0 skips
   1614 decicycles in qsort,    1024 runs,      0 skips
   1874 decicycles in qsort,    2046 runs,      2 skips
   2186 decicycles in qsort,    4094 runs,      2 skips

old:
 246960 decicycles in qsort,       1 runs,      0 skips
 135765 decicycles in qsort,       2 runs,      0 skips
  70920 decicycles in qsort,       4 runs,      0 skips
  37710 decicycles in qsort,       8 runs,      0 skips
  20831 decicycles in qsort,      16 runs,      0 skips
  12225 decicycles in qsort,      32 runs,      0 skips
   8083 decicycles in qsort,      64 runs,      0 skips
   6270 decicycles in qsort,     128 runs,      0 skips
   5321 decicycles in qsort,     256 runs,      0 skips
   4860 decicycles in qsort,     512 runs,      0 skips
   4424 decicycles in qsort,    1024 runs,      0 skips
   4191 decicycles in qsort,    2046 runs,      2 skips
   4934 decicycles in qsort,    4094 runs,      2 skips

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-26 07:14:22 -04:00
James Darnley
bff7242608 avfilter/vf_removegrain: add x86 and x86_64 SSE2 functions
Speed of all modes increased by a factor between 7.4 and 19.8 largely depending
on whether bytes are unpacked into words.  Modes 2, 3, and 4 have been sped-up
by a factor of 43 (thanks quick sort!)

All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20,
21, and 22 are available on x86 due to the number of SIMD registers used.

With a contribution from James Almer <jamrial@gmail.com>
2015-07-14 23:50:50 +00:00
Paul B Mahol
ae55fc82a8 avfilter/vf_removegrain: clip to uint16 instead to uint8
This is how original filter behaves.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-07-10 10:50:28 +00:00
Paul B Mahol
91748662bc avfilter: add removegrain
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-07-08 16:02:34 +00:00