The astyle configuration makes sure normal code is indented consistently
with 2 spaces, but astyle doesn't seem to touch the indentation in
these multi-line macros.
This does the same cleanup as was done in 4fc27714bd on new occurrances
in the code base.
This fixes building with MSVC 2015 (which is still only a release
candidate).
This makes them consistent with the rest of the assembly source
files. Prior to f2314151e8, all the assembly files had consistent
indentation, but after that, this file had been made different.
Add Intra 8x8 support
Add no_deblocking support inside T8x8
Add CABAC parse support
Add static data sheet for dequant
Fix bugs and clean/astyle the code
Remove build warnings
Modify the UT cases
Fix the ParseNalHeader bug
This makes sure the code is at least somewhat tested. Ideally those
assembly functions should be upgraded to work in 64 bit mode as well,
but until then it's better to actually use them in the configurations
where they can be built.
The apple assembler for arm can handle the gnu binutils style
macros just fine these days, so there is no need to duplicate all
of these macros in two syntaxes, when the new one works fine in all cases.
We already require a new enough assembler to support the gnu binutils
style features since we use the .rept directive in a few places.
The apple assembler for arm64 can handle the gnu binutils style
macros just fine, so there is no need to duplicate all of these
macros in two syntaxes, when the new one works fine in all cases.
We already require a new enough assembler to support the gnu binutils
style features since we use the .rept directive in a few places.
This makes for a chroma plane of 2x2. The SIMD versionf of generic
downscalers assume that the width and height is at least 2, since
it does an unconditional loop for the body of the image, and a
separate step for the last pixel and last row. The SIMD versions
assume that (width-1) and (height-1) are larger than zero.
This is the same fix as in e8cdbd2ea7, but making sure it applies
to both dimensions, that commit only fixed it for one of the
dimensions.
This fixes spurious crashes in EncodeDecodeTestAPI.SimulcastSVC.
Previously this used unofficial, apple specific syntax (with fallback
macros for gnu binutils), since Xcode 5.x didn't support the official
syntax of these instructions. Since Xcode 6 has been out for quite a
number of months already, it should be safe to require this (for
building 64 bit binaries for iOS, armv7 builds can still be built
with older Xcode versions).
This clarifies the code by avoiding apple specific syntax in the
assembler instructions.
This makes for a chroma plane of 2x2. The SIMD versionf of generic
downscalers assume that the width and height is at least 2, since
it does an unconditional loop for the body of the image, and a
separate step for the last pixel and last row. The SIMD versions
assume that (width-1) and (height-1) are larger than zero.
This fixes spurious crashes in EncodeDecodeTestAPI.SetOptionEncParamExt.
When calculating what resolution to actually downscale to,
it can end up smaller than what the caller set. When scaling
down to resolutions close to the limit of allowed values,
this can end up setting values lower than the limit.
Previously, e.g. a downscale from 2266x8 to 566x2 will end
up as 566x1 after this calculation. When scaling to a
566x1, the chroma plane gets a height of 0, which doesn't
make sense, and which breaks e.g. the SSE2 scaler. Therefore,
make sure none of the dimensions end up set below 2.
Normally, the DownsamplePadding skips scaling if the target
size is the same as the source size, assuming that the caller
will use the source data pointer in that case. This is true
for the base layer (the first call to DownsamplePadding in
SingleLayerPreprocess), but when downsampling the other layers,
there is no special handling for the case when the target
is the same size as the source.
Previously, the encoding of such spatial layers will use
completely uninitialized data, encoding complete garbage.
Instead force DownsamplePadding to make a copy if no scaling
is required, for the dependency layers. The base layer still
avoids a copy unless scaling of that layer is required.
Whether it actually makes sense to have lower spatial layers
the same size as the original one is a different question
though - currently the code allows it, and
EncodeDecodeTestAPI.SetOptionEncParamExt will try to use it.
They are still used slightly differently in the encoder and decoder;
the decoder uses plain functions while the encoder uses one object
keeping track of the number of allocated bytes, and keeping track
of the requested alignment.
When generating a new version of the header, that includes the
actual git hash, don't overwrite the file that is tracked by git.
Instead create a new file, and include this only if the build system
indicates that it exists (by setting a define). This allows the
untouched source tree to be built from within an IDE even if make
has not been run.
This reduces the hassle with a file that needs to be ignored in the
git configuration.
The downside is that the generated file isn't used if building
from within an IDE, if the header has been updated by calling make
before (since the IDE configuration doesn't know whether the user
actually has run make). Since users of the IDE might not build via
make in the command line at all (in the same source checkout at least),
this should not be an issue in practice. The previous way things worked,
the version hash (generated by make) when used in an IDE could actually
be outdated and misleading.
This function actually zero-initializes the allocated memory, thus
make this clear in the function name.
This makes the function name match the same behaviour in the encoder.
Use the decoder versions of the functions (which are capable
of handling widths 4/8/16 for luma, not only 16 as in the
encoder). By using the more generic versions, there may be a small
performance loss since the functions need to check the width
in every call. Actual measurements show that the actual change is
very small (and the shared routines turn out to actually be faster
than the existing ones in ARM NEON setups).