Shaves at least 3KB off code size on x86, should improve cache utilization. This would probably be useful to do for other decoders/encoders as well.
prob[0] is the only prob array ever accessed, so prob[1] can serve as padding for prob[0].