WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell).
WelsSampleSatd16x8_avx2 (~2.19x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x16_avx2 (~1.68x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x8_avx2 (~1.53x speedup over SSE4.1 on Haswell).
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:
- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)
These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:
- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)
These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.
~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.
Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.
~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
IDCT input is defined in such a way that the intermediate values
cannot legally overflow an int16_t. The use of random values
as input causes such overflows. This results in implementation-
dependent output depending on which type is used to hold
intermediate results. Use a template for the test reference
implementation to test implementations with different
intermediate representation.
Initialize input arrays with different random values.
Otherwise, the input to the DCT routines is effectively
all zero values after taking the difference.
Reduce duplication.
In dc2cbe4, the previous test for GetOption that succeeds when the
decoder is initialized was removed. Add a GetOption call for a different
option, now that DECODER_OPTION_DATAFORMAT is removed.
Before dc2cbe4, the DecoderConfigParam function returned early
since DecoderSetCsp signaled a failure, which is why the uninitialized
parameters weren't read before.
This fixes valgrind warnings about conditional jumps depending on
uninitialized values.