The .align directive takes an argument in number of bits, i.e. the
actual alignment is 2^n. Previously building with binutils failed,
since 16 isn't a valid parameter to .align, the maximum is 15.
Thus, this makes the code try to align to 16 bytes, instead of aligning
to 65536 bytes.
This fixes building for android.
This also clears up the same mistake in the aarch64 code, even though
that one built just fine.
If kiCpuCores < 2, then iCountThreadsNum (and iMultipleThreadIdc)
can't be >= 2, because they're initialized with
WELS_MIN (kiCpuCores, ...) just a few lines above.
If iMultipleThreadIdc is initially set to 0 by the caller, this
removed piece of code would change it to 1, if kiCpuCores < 2.
When iMultipleThreadIdc is changed from the originally set value,
a call to WelsEncoderParamAdjust with the original parameters
would reset the whole codec since iMultipleThreadIdc differs.
This fixes running EncoderInterfaceTest.TemporalLayerSettingTest
on machines where the detected number of cores is 1.
When deblocking over slice edges (in single threaded sliced
encoding), uiNeighborAvail might say that a MB isn't available,
even though it is available for deblocking here.
This fixes the encoder test of dynamic slicing, with neon
optimizations enabled.
In WelsHadamardQuant2x2SkipKernel_AArch64_neon, write the output
of cmhi into v0 - this is the register that is assumed to hold
the output.
In WelsHadamardQuant2x2_AArch64_neon, subtract the count of zero
elements from 4, not from 16.
This makes the encoding unit tests pass again.
According to the arm architecture reference manual, the mov (vector)
instruction can only use the arrangement specifiers '8b' and '16b'.
The apple tools still accept the '8h' form, but it assembles into the
same as '16b'. (When copying a vector register to another, the element
size in the vectors don't matter.)
This fixes building with gnu binutils.