Move asm routines to common. Delete obsolete decoder routines.
Use wider routines where applicable.
~1.07x overall faster decode on a quick 720p30 4Mbps test on Haswell.
They are still used slightly differently in the encoder and decoder;
the decoder uses plain functions while the encoder uses one object
keeping track of the number of allocated bytes, and keeping track
of the requested alignment.
Use the decoder versions of the functions (which are capable
of handling widths 4/8/16 for luma, not only 16 as in the
encoder). By using the more generic versions, there may be a small
performance loss since the functions need to check the width
in every call. Actual measurements show that the actual change is
very small (and the shared routines turn out to actually be faster
than the existing ones in ARM NEON setups).
This makes sure this is set to the exact same string in all
the configurations, simplifying editing multiple configurations
at the same time.
This changes the output directory for 64 bit binaries from
bin/win64 to bin/x64, but this is the common pattern used by
MSVC in new projects.
Use the "Program Database (/Zi)" in release mode and in debug
mode for x64, use "Program Database for Edit & Continue (/ZI)"
in debug mode for Win32.
This is how new visual studio projects are set by default.
Enable it on the project level, instead of having to set separate options
for both compiler and linker.
The processing project actually had the options set in this way originally
as well.
This reverts commit 7aff66d40ccfc9c4daf11691c73f4d5c9a3cba34.
These CRLF marks are readded by MSVC as soon as the project files
are updated from within the GUI anyway.
All the code that relies on separating them uses the built-in defines
_WIN32 and _WIN64, or the corresponding machine defines (such as
_M_IX86, for MSVC 32 bit inline assembly).
This changes the indentation from space to tabs, and adds missing
dos newlines to these few lines.
This makes the file be detected as using dos newlines properly in
certain editors.
According to "nasm -h", there is no -O3 parameter at all, and
the highest optimization level (-Ox) is already the default.
The corresponding parameter never was set when building with the
make build system.
Both encoder and decoder versions were functionally equivalent,
but I picked the decoder version (but added the static inline
keywords to it) since the encoder one was quite messy with a lot
of commented out code.
No code exists within the project for building such a trace library.
This also fixes building on OS X with -Wno-deprecated-declarations
removed, since this code contained calls to deprecated functions
within #ifdef MACOS, which now are enabled when building on OS X.