The AVCodecContext.get_format callback is not only used for pixel format
negotiation with the API user, but also for hwaccel init. For the
latter, it's required that some codec parameters, in particular the
codec profile, are set when the callback is invoked.
This patch removes a get_format invocation where this is not guaranteed.
The codec parameters, including the profile, are really set further
below. (The same code path that sets the profile also calls get_format
properly too.)
This just happened to work by coincidence in most cases. For example, if
the API user just copied or reused the AVStream's AVCodecContext when
decoding, the profile would be set properly. But in some cases it
fails., such as with the sample WolfensteinTwitch.mp4 on the samples
server.
Remove the redundant get_format call. Apparently it serves no purpose
anymore, although it is possible that this was different at the time it
was added in commit ffd77f94a2.
This fixes hwaccel usage for API users which do not set the profile
when setting up the AVCodecContext (which is allowed).
This allows more efficient access to the array as the level and flags
are contiguous. Around 4% faster coefficient decoding.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This makes the h.264 decoder threadsafe to initialize.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Work on the AVFrame references directly.
Instead of setting up a flipped/swapped "view" on the pictures,
flip/swap them when returning decoded frames to the API user.
Rather than copying data buffers around, allocate a proper frame, and
use the standard AVFrame functions. This effectively makes the decoder
capable of direct rendering.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
This avoid going through constants.c while still sharing them
with proresdsp.asm
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch fixes build of AAC encoder optimized for mips that was broken due
to some changes in generic code that were not propagated to the optimized code.
Also, some functions in the optimized code are basically duplicate of functions
from generic code. Since they do not bring enough improvement to the optimized
code to justify their existence, they are removed (which improves
maintainability of the optimized code).
Optimizations disabled in 97437bd are enabled again.
Signed-off-by: Nedeljko Babic <nedeljko.babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
These aren't quite as helpful as the ones in 8bpp, since over there,
we can use pmulhrsw, but here the coefficients have too many bits to
be able to take advantage of pmulhrsw. However, we can still skip
cols for which all coefs are 0, and instead just zero the input data
for the row itx. This helps a few % on overall decoding speed.
The trouble with this function is that intermediates overflow 31+sign
bits, so I've added some helpers (that will also be used in 10/12bpp
8x8, 16x16 and 32x32) to make that easier, basically emulating a half-
assed pmaddqd using 2xpmaddwd. It's currently sse2-only, if anyone sees
potential in adding ssse3, I'd love to hear it.
On 12 frames of a 444p 12 bits DNxHR sequence, _put function:
C: 78902 decicycles in idct, 262071 runs, 73 skips
avx: 32478 decicycles in idct, 262045 runs, 99 skips
Difference between the 2:
stddev: 0.39 PSNR:104.47 MAXDIFF: 2
This is unavoidable and due to the scale factors used in the x86
version, which cannot match the C ones.
In addition, the trick of adding an initial bias to the input of a
pass can overflow, as the input coefficients are already 15bits,
which is the maximum this function can handle.
Overall, however, the omse on 12 bits samples goes from 0.16916 to
0.16883. Reducing rowshift by 1 improves to 0.0908, but causes
overflows.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>