Ganesh Ajjanagadde
68a0a164d1
avfilter/vf_removegrain: replace qsort with AV_QSORT
...
filter_slice calls qsort, so qsort is in a performance critical
position. AV_QSORT is substantially faster due to the inlining of the
comparison callback. Thus, the increase in performance is worth the
increase in binary size.
Sample benchmark (x86-64, Haswell, GNU/Linux),
filter-removegrain-mode-02 (from FATE)
new:
24060 decicycles in qsort, 1 runs, 0 skips
15690 decicycles in qsort, 2 runs, 0 skips
9307 decicycles in qsort, 4 runs, 0 skips
5572 decicycles in qsort, 8 runs, 0 skips
3485 decicycles in qsort, 16 runs, 0 skips
2517 decicycles in qsort, 32 runs, 0 skips
1979 decicycles in qsort, 64 runs, 0 skips
1911 decicycles in qsort, 128 runs, 0 skips
1568 decicycles in qsort, 256 runs, 0 skips
1596 decicycles in qsort, 512 runs, 0 skips
1614 decicycles in qsort, 1024 runs, 0 skips
1874 decicycles in qsort, 2046 runs, 2 skips
2186 decicycles in qsort, 4094 runs, 2 skips
old:
246960 decicycles in qsort, 1 runs, 0 skips
135765 decicycles in qsort, 2 runs, 0 skips
70920 decicycles in qsort, 4 runs, 0 skips
37710 decicycles in qsort, 8 runs, 0 skips
20831 decicycles in qsort, 16 runs, 0 skips
12225 decicycles in qsort, 32 runs, 0 skips
8083 decicycles in qsort, 64 runs, 0 skips
6270 decicycles in qsort, 128 runs, 0 skips
5321 decicycles in qsort, 256 runs, 0 skips
4860 decicycles in qsort, 512 runs, 0 skips
4424 decicycles in qsort, 1024 runs, 0 skips
4191 decicycles in qsort, 2046 runs, 2 skips
4934 decicycles in qsort, 4094 runs, 2 skips
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-26 07:14:22 -04:00
James Darnley
bff7242608
avfilter/vf_removegrain: add x86 and x86_64 SSE2 functions
...
Speed of all modes increased by a factor between 7.4 and 19.8 largely depending
on whether bytes are unpacked into words. Modes 2, 3, and 4 have been sped-up
by a factor of 43 (thanks quick sort!)
All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20,
21, and 22 are available on x86 due to the number of SIMD registers used.
With a contribution from James Almer <jamrial@gmail.com>
2015-07-14 23:50:50 +00:00