If the calling test hasn't set m_iPicResSize, it is set to the
maximum frame size, which takes much longer to initialize than the
current actual frame size.
This reduces the runtime of EncoderInterfaceTest.SkipFrameCheck
in valgrind from 229 seconds to 8 seconds, and the total runtime
of all the test cases in EncoderInterfaceTest from 405 seconds
to 89 seconds.
They are still used slightly differently in the encoder and decoder;
the decoder uses plain functions while the encoder uses one object
keeping track of the number of allocated bytes, and keeping track
of the requested alignment.
When generating a new version of the header, that includes the
actual git hash, don't overwrite the file that is tracked by git.
Instead create a new file, and include this only if the build system
indicates that it exists (by setting a define). This allows the
untouched source tree to be built from within an IDE even if make
has not been run.
This reduces the hassle with a file that needs to be ignored in the
git configuration.
The downside is that the generated file isn't used if building
from within an IDE, if the header has been updated by calling make
before (since the IDE configuration doesn't know whether the user
actually has run make). Since users of the IDE might not build via
make in the command line at all (in the same source checkout at least),
this should not be an issue in practice. The previous way things worked,
the version hash (generated by make) when used in an IDE could actually
be outdated and misleading.
This function actually zero-initializes the allocated memory, thus
make this clear in the function name.
This makes the function name match the same behaviour in the encoder.
the decoder did not produce valid output, call DrainComplete in Drain
and FlushComplete in Flush. Without these, the caller of the GMP may
end up waiting forever on a requested operation to complete.
Use the decoder versions of the functions (which are capable
of handling widths 4/8/16 for luma, not only 16 as in the
encoder). By using the more generic versions, there may be a small
performance loss since the functions need to check the width
in every call. Actual measurements show that the actual change is
very small (and the shared routines turn out to actually be faster
than the existing ones in ARM NEON setups).