8b0cf5f79d
count can be reduced to short because the max number of filtered frames is set to 15. the max value for any frame is 32 (modifier = 16, filter_weight = 2). 15*32 = 480 which requires 9 bits this function goes from about 7000 us / 1000 iterations for the C code to < 275 us / 1000 iterations for sse2 for block_size = 16 and from about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8 Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e