Faster version of denoiser, cut cost by 1.7x for C path, by 3.3x for SSE2 path. Change-Id: I154786308550763bc0e3497e5fa5bfd1ce651beb