According to the Win64 ABI, these registers need to be preserved,
and compilers are allowed to rely on their content to stay
available - not only for float usage but for any usage, anywhere,
in the calling C++ code.
This adds a macro which pushes the clobbered registers onto the
stack if targeting win64 (and a matching one which restores them).
The parameter to the macro is the number of xmm registers used
(e.g. if using xmm0 - xmm7, the parameter is 8), or in other
words, the number of the highest xmm register used plus one.
This is similar to how the same issue is handled for the NEON
registers q4-q7 with the vpush instruction, except that they needed
to be preserved on all platforms, not only on one particular platform.
This allows removing the XMMREG_PROTECT_* hacks, which can
easily fail if the compiler chooses to use the callee saved
xmm registers in an unexpected spot.
According to the calling convention, the registers q4-q7 should be
preserved by functions. The caller (generated by the compiler) could
be using those registers anywhere for any intermediate data.
Functions that use more than 12 of the qX registers must push
the clobbered registers on the stack in order to be able to restore them
afterwards.
In functions that don't use all 16 registers, but clobber some of
the callee saved registers q4-q7, one or more of them are remapped
to reduce the number of registers that have to be saved/restored.
This incurs a very small (around 0.5%) slowdown in the decoder and
encoder.
According to the calling convention, the registers q4-q7 should be
preserved by functions. The caller (generated by the compiler) could
be using those registers anywhere for any intermediate data.
Functions that use 12 or less of the qX registers can avoid
violating the calling convention by simply using other registers instead
of the callee saved registers q4-q7.
This change only remaps the registers used within functions - therefore
this does not affect performance at all. E.g. in functions using
registers q0-q7, we now use q0-q3 and q8-q11 instead.
There's a different version of the same function in the encoder,
but they're not identical - the encoder version has got stricter
alignment requirements.
If someone can confirm that it is ok to use the function from the
encoder, pixel_sad_neon.S in processing could be deleted, and the
encoder version moved to codec/common instead.
The caller of the function should not need to know exactly which
implementation of it is being used.
For the variants that don't support detecting the number of cores,
the pNumberOfLogicProcessors parameter can be left untouched
and the caller will use a higher level API for finding it out.
This simplifies all the calling code, and simplifies adding
more implementations of cpu feature detection.
This reduces the huge amount of near-useless small extra files
scattered around for the sake of the platform demo projects.
This requires explicitly listing all the ncessary include paths.
TRUE/FALSE has intentionally been left in use for the few
platform specific APIs that define these constants themselves
and expect them to be used, for consistency.
The actual float and double data types are defined in C89 and are
usable as such without any extra typedefs.
Removing the extra typedefs simplifies the compatibility typedef
headers, simplifies portability and makes the code base easier
to work with for people new to the library.
int8_t in general should to be defined as signed char, since there
are actual envrionments where a plain 'char' is unsigned.
This also reduces the differences between the typedef headers of
the different sub-libraries.
This makes them match the same macros in the main decoder/encoder
libraries. long_t (which is typedeffed to long) actually is 64
bit on 64 bit unix platforms, which might not be what was
intended.
Just use the standard inline keyword with sufficient backwards
compatibility defines, similar to how it is done in the main decoder
and encoder libraries.
The corresponding files for the decoder and encoder were
removed in ccca04453a. The 2008 version can be imported into
MSVC 2010 and 2012 just fine, reducing the amount of project files
to keep in sync.
This fixes building on mingw-w64.
Include stdint.h on everything except MSVC for definitions of
common standard types, include stddef.h on MSVC instead, since
MSVC doesn't have stdint.h in all older versions that are
supposed to be supported, but MSVC always defines intptr_t via
stddef.h.
Commit f38111d76b updated these files
manually (based on older versions of them) to something not generated
by the current mktargets.py/sh, losing the compact pattern rules.