This makes sure the code is at least somewhat tested. Ideally those
assembly functions should be upgraded to work in 64 bit mode as well,
but until then it's better to actually use them in the configurations
where they can be built.
The apple assembler for arm can handle the gnu binutils style
macros just fine these days, so there is no need to duplicate all
of these macros in two syntaxes, when the new one works fine in all cases.
We already require a new enough assembler to support the gnu binutils
style features since we use the .rept directive in a few places.
The apple assembler for arm64 can handle the gnu binutils style
macros just fine, so there is no need to duplicate all of these
macros in two syntaxes, when the new one works fine in all cases.
We already require a new enough assembler to support the gnu binutils
style features since we use the .rept directive in a few places.
This makes for a chroma plane of 2x2. The SIMD versionf of generic
downscalers assume that the width and height is at least 2, since
it does an unconditional loop for the body of the image, and a
separate step for the last pixel and last row. The SIMD versions
assume that (width-1) and (height-1) are larger than zero.
This is the same fix as in e8cdbd2ea7, but making sure it applies
to both dimensions, that commit only fixed it for one of the
dimensions.
This fixes spurious crashes in EncodeDecodeTestAPI.SimulcastSVC.
Previously this used unofficial, apple specific syntax (with fallback
macros for gnu binutils), since Xcode 5.x didn't support the official
syntax of these instructions. Since Xcode 6 has been out for quite a
number of months already, it should be safe to require this (for
building 64 bit binaries for iOS, armv7 builds can still be built
with older Xcode versions).
This clarifies the code by avoiding apple specific syntax in the
assembler instructions.
This makes for a chroma plane of 2x2. The SIMD versionf of generic
downscalers assume that the width and height is at least 2, since
it does an unconditional loop for the body of the image, and a
separate step for the last pixel and last row. The SIMD versions
assume that (width-1) and (height-1) are larger than zero.
This fixes spurious crashes in EncodeDecodeTestAPI.SetOptionEncParamExt.
When calculating what resolution to actually downscale to,
it can end up smaller than what the caller set. When scaling
down to resolutions close to the limit of allowed values,
this can end up setting values lower than the limit.
Previously, e.g. a downscale from 2266x8 to 566x2 will end
up as 566x1 after this calculation. When scaling to a
566x1, the chroma plane gets a height of 0, which doesn't
make sense, and which breaks e.g. the SSE2 scaler. Therefore,
make sure none of the dimensions end up set below 2.
Normally, the DownsamplePadding skips scaling if the target
size is the same as the source size, assuming that the caller
will use the source data pointer in that case. This is true
for the base layer (the first call to DownsamplePadding in
SingleLayerPreprocess), but when downsampling the other layers,
there is no special handling for the case when the target
is the same size as the source.
Previously, the encoding of such spatial layers will use
completely uninitialized data, encoding complete garbage.
Instead force DownsamplePadding to make a copy if no scaling
is required, for the dependency layers. The base layer still
avoids a copy unless scaling of that layer is required.
Whether it actually makes sense to have lower spatial layers
the same size as the original one is a different question
though - currently the code allows it, and
EncodeDecodeTestAPI.SetOptionEncParamExt will try to use it.
They are still used slightly differently in the encoder and decoder;
the decoder uses plain functions while the encoder uses one object
keeping track of the number of allocated bytes, and keeping track
of the requested alignment.
Use the decoder versions of the functions (which are capable
of handling widths 4/8/16 for luma, not only 16 as in the
encoder). By using the more generic versions, there may be a small
performance loss since the functions need to check the width
in every call. Actual measurements show that the actual change is
very small (and the shared routines turn out to actually be faster
than the existing ones in ARM NEON setups).