They are still used slightly differently in the encoder and decoder;
the decoder uses plain functions while the encoder uses one object
keeping track of the number of allocated bytes, and keeping track
of the requested alignment.
When generating a new version of the header, that includes the
actual git hash, don't overwrite the file that is tracked by git.
Instead create a new file, and include this only if the build system
indicates that it exists (by setting a define). This allows the
untouched source tree to be built from within an IDE even if make
has not been run.
This reduces the hassle with a file that needs to be ignored in the
git configuration.
The downside is that the generated file isn't used if building
from within an IDE, if the header has been updated by calling make
before (since the IDE configuration doesn't know whether the user
actually has run make). Since users of the IDE might not build via
make in the command line at all (in the same source checkout at least),
this should not be an issue in practice. The previous way things worked,
the version hash (generated by make) when used in an IDE could actually
be outdated and misleading.
This function actually zero-initializes the allocated memory, thus
make this clear in the function name.
This makes the function name match the same behaviour in the encoder.
the decoder did not produce valid output, call DrainComplete in Drain
and FlushComplete in Flush. Without these, the caller of the GMP may
end up waiting forever on a requested operation to complete.
Use the decoder versions of the functions (which are capable
of handling widths 4/8/16 for luma, not only 16 as in the
encoder). By using the more generic versions, there may be a small
performance loss since the functions need to check the width
in every call. Actual measurements show that the actual change is
very small (and the shared routines turn out to actually be faster
than the existing ones in ARM NEON setups).
Even if there actually is no SIMD optimized version of the
width==2 cases, luma function for SIMD still needs to handle
it by calling McCopyWidthEq2_c for these cases.
This simplifies the UT code a little, and makes sure that
those codepaths are tested properly.
This speeds up the compile time from 21.3 to 2.6 seconds
for the MC test files.
This makes it slightly harder to see exactly which test
failed on a quick glance, but it makes the overall structure of
the unit test output more manageable and readable, by reducing
the number of tests from 1300 to 430.