~60-65% faster at the function level across block sizes Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893