
The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong, this bug waste me much time while I make aarch64 arm instruction which used the function.
The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong, this bug waste me much time while I make aarch64 arm instruction which used the function.