Original fix by one of these developers:
Anton Khirnov <anton@khirnov.net>
Diego Biurrun <diego@biurrun.de>
Luca Barbato <lu_zero@gentoo.org>
Martin Storsjö <martin@martin.st>
See 97962b2 / 72ca830
Personnal guess is Diego Biurrun.
1789 decicycles in idct_idct_4x4_add_c, 262136 runs, 8 skips
1839 decicycles in idct_idct_4x4_add_c, 524270 runs, 18 skips
1864 decicycles in idct_idct_4x4_add_c, 1048548 runs, 28 skips
529 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 262138 runs, 6 skips
516 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 524282 runs, 6 skips
474 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 1048565 runs, 11 skips
(~3.9x faster)
7726 decicycles in idct_idct_8x8_add_c, 1048433 runs, 143 skips
7732 decicycles in idct_idct_8x8_add_c, 2096882 runs, 270 skips
7731 decicycles in idct_idct_8x8_add_c, 4193772 runs, 532 skips
1145 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 1048549 runs, 27 skips
1137 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 2097097 runs, 55 skips
1086 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 4194188 runs, 116 skips
(~7.1x faster)
Overall decode time before commit:
16.48s user 0.03s system 99% cpu 16.526 total
16.54s user 0.01s system 99% cpu 16.566 total
16.46s user 0.03s system 99% cpu 16.511 total
Overall decode time after commit:
16.34s user 0.02s system 99% cpu 16.378 total
16.28s user 0.02s system 99% cpu 16.315 total
16.32s user 0.03s system 99% cpu 16.366 total
Tested on i7 920 with 40s 1080p footage.