The xmm reg count was incorrect, and manual loading of the gprs
furthermore allows to noticeable reduce the number needed.
The modified functions are used in weighted prediction, so only a
few samples like WP_* exhibit a change. For this one and Win64
(some widths removed because of too few occurrences):
WP_A_Toshiba_3.bit, ff_hevc_put_hevc_uni_w
16 32
before: 2194 3872
after: 2119 3767
WP_B_Toshiba_3.bit, ff_hevc_put_hevc_bi_w
16 32 64
before: 2819 4960 9396
after: 2617 4788 9150
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
By default, macro EPEL_FILTER loads the coefficients inconditionally
into m14/m15. This forces an unneeded higher register count.
Reduce that count by making them parameters of EPEL_FILTER.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>