6fb418741f
Also inline some of the block calculations to assist the compiler to not do silly things like calculating the same offset (or converting between raster/transform block offset or block, mi and pixel unit) many, many, many times. Cycle times: 4x4: 584 -> 505 cycles (16% faster) 8x8: 1651 -> 1560 cycles (6% faster) 16x16: 7897 -> 7704 cycles (2.5% faster) 32x32: 16096 -> 15852 cycles (1.5% faster) Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall. Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80