Unrolling the main loop to process, instead of 4 elements:
- 8: minor gain of 2 cycles (not worth the extra object size)
- 2: loss of 8 cycles.
Assigning STEP to a register is a loss. Output address (Y) is almost always
unaligned.
Timings:
- C (32/64 bits): 117/109 cycles
- SSE: 57 cycles
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Since we are clipping before we shift the values to
16 or 32 bits, we should not shift the min/max clip
values to compensate.
Fixes 8 and 24 bit lossy decoding.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This allows opting for a lower MTU than what the AVIOContext
indicated, and allows writing into outputs that don't indicate
an MTU at all (such as plain files, which is useful for testing).
This also allows querying for the MTU via the avoption.
Signed-off-by: Martin Storsjö <martin@martin.st>
This library does not fit into Libav as a whole and its code is just a
maintenance burden. Furthermore it is now available as an external project,
which completely obviates any reason to keep it around.
URL: http://git.videolan.org/?p=libpostproc.git
If the PNG filter is enabled, a PNG-style filter will run over the
input buffer, writing into the buffer. Therefore, if no zlib compression
was used, ensure that we copy into a temporary buffer, otherwise we
overwrite user-provided input data.
This prevents crashers and errors further down when reading nodes in the
empty tree.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
According to newer RFCs, this packetization scheme should only
be used for interfacing with legacy systems.
Implementing this packetization mode properly requires parsing
the full H263 bitstream to find macroblock boundaries (and knowing
their macroblock and gob numbers and motion vector predictors).
This implementation tries to look for GOB headers (which
can be inserted by using -ps <small number>), but if the GOBs
aren't small enough to fit into the MTU, the packetizer blindly
splits packets at any offset and claims it to be a GOB boundary
(by using Mode A from the RFC). While not correct, this seems
to work with some receivers.
Signed-off-by: Martin Storsjö <martin@martin.st>