Commit Graph

33918 Commits

Author SHA1 Message Date
Michael Niedermayer
5063a18f56 avcodec/ffv1dec: update progress in case of broken pointer chains
Fixes deadlock
Fixes Ticket4932

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-16 22:25:20 +02:00
Michael Niedermayer
4c2d4e8700 avcodec/ffv1dec: Clear slice coordinates if they are invalid or slice header decoding fails for other reasons
Fixes Ticket4931

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-16 21:14:56 +02:00
Agatha Hu
2b5dda3f48 avcodec/nvenc: fix b frame n_quant_offset
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2015-10-16 18:24:10 +02:00
Paul B Mahol
8b11e43799 avcodec: add ADPCM PSX decoder
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-16 16:37:40 +02:00
Hendrik Leppkes
e12908d71e vp9: use AVFrame.buf[0] to check if a frame is valid
AVFrame.data[0] is not guaranteed to be set with a HWAccel
2015-10-16 14:53:41 +02:00
Michael Niedermayer
c980c5e54d avcodec/jpeg2000dec: Clear properties in jpeg2000_dec_cleanup() too
Fixes: Ticket4878

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-15 22:00:49 +02:00
wm4
17fe18d21a Revert "avcodec/h264: remove redundant and bogus get_format call"
This reverts commit be583c6fd3.

This was not approved, and was accidentally pushed. I'm very sorry.
2015-10-15 20:19:55 +02:00
wm4
be583c6fd3 avcodec/h264: remove redundant and bogus get_format call
The AVCodecContext.get_format callback is not only used for pixel format
negotiation with the API user, but also for hwaccel init. For the
latter, it's required that some codec parameters, in particular the
codec profile, are set when the callback is invoked.

This patch removes a get_format invocation where this is not guaranteed.
The codec parameters, including the profile, are really set further
below. (The same code path that sets the profile also calls get_format
properly too.)

This just happened to work by coincidence in most cases. For example, if
the API user just copied or reused the AVStream's AVCodecContext when
decoding, the profile would be set properly. But in some cases it
fails., such as with the sample WolfensteinTwitch.mp4 on the samples
server.

Remove the redundant get_format call. Apparently it serves no purpose
anymore, although it is possible that this was different at the time it
was added in commit ffd77f94a2.

This fixes hwaccel usage for API users which do not set the profile
when setting up the AVCodecContext (which is allowed).
2015-10-15 20:16:13 +02:00
Derek Buitenhuis
1a29804558 aac: Make codec init run under ff_thread_once
This makes AAC init threadsafe.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2015-10-15 13:48:55 -04:00
Hendrik Leppkes
6e719dc6bb vp9: expose reference frames in VP9SharedContext 2015-10-15 13:02:23 +02:00
Ronald S. Bultje
b95f241b6e vp9: split header into separate struct and expose in vp9.h
This allows hwaccels to access the bitstream header information.
2015-10-15 13:02:20 +02:00
Christophe Gisquet
96b165fae2 dnxhd: interleave AC levels and flags
This allows more efficient access to the array as the level and flags
are contiguous. Around 4% faster coefficient decoding.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-15 02:46:29 +02:00
Hendrik Leppkes
15db457ea8 Merge commit 'd15368ee3926152a3a301c13cc638fbf7a062ddf'
* commit 'd15368ee3926152a3a301c13cc638fbf7a062ddf':
  h264: Run VLC init under pthread_once

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-10-14 23:06:06 +02:00
Hendrik Leppkes
b66a94ab53 Merge commit '08377f9c3bf6dbe216512a2e05c9fac837b13fc0'
* commit '08377f9c3bf6dbe216512a2e05c9fac837b13fc0':
  dxva: Include last the internal header

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-10-14 23:02:00 +02:00
Hendrik Leppkes
8ededd5836 Merge commit '6a23a34274b747280c1e4a00ad22f97f99bbb48a'
* commit '6a23a34274b747280c1e4a00ad22f97f99bbb48a':
  mimic: drop AVPicture usage

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-10-14 15:01:54 +02:00
Hendrik Leppkes
3d93ff289e Merge commit '6fdd4c678ac1ce0776f9645cd534209e5f1ae1e3'
* commit '6fdd4c678ac1ce0776f9645cd534209e5f1ae1e3':
  libschroedinger: Properly use AVFrame API

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-10-14 15:00:53 +02:00
Hendrik Leppkes
9c3f75c29d Merge commit '901f9c0a32985f48672fd68594111dc55d88a57a'
* commit '901f9c0a32985f48672fd68594111dc55d88a57a':
  qtrle: Properly use AVFrame API

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-10-14 14:56:16 +02:00
Derek Buitenhuis
d15368ee39 h264: Run VLC init under pthread_once
This makes the h.264 decoder threadsafe to initialize.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-10-14 14:35:34 +02:00
Luca Barbato
08377f9c3b dxva: Include last the internal header
It redefines _WIN32_WINNT, possibly causing problems with the
w32pthreads.h header.
2015-10-14 14:35:34 +02:00
Hendrik Leppkes
037b44a3b4 Merge commit '00332e0a064dad866812de9162b009cbaba6f5df'
* commit '00332e0a064dad866812de9162b009cbaba6f5df':
  wrapped_avframe: Initial implementation

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2015-10-14 13:26:17 +02:00
wm4
6a23a34274 mimic: drop AVPicture usage
Work on the AVFrame references directly.

Instead of setting up a flipped/swapped "view" on the pictures,
flip/swap them when returning decoded frames to the API user.
2015-10-14 11:25:53 +02:00
Vittorio Giovara
6fdd4c678a libschroedinger: Properly use AVFrame API
Rather than copying data buffers around, allocate a proper frame, and
use the standard AVFrame functions. This effectively makes the decoder
capable of direct rendering.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2015-10-14 11:24:55 +02:00
Vittorio Giovara
901f9c0a32 qtrle: Properly use AVFrame API
Rather than copying data buffers around, just add a reference to
the current frame.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2015-10-14 11:24:24 +02:00
James Almer
74a87ae210 x86/vp9itxfm: fix register clobbering in ff_vp9_idct_idct_4x4_add_12_sse2
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-13 20:21:33 -03:00
Christophe Gisquet
234369d0fd dnxhdenc: fix access outside of image
This is the same test as for the 8bit case.
2015-10-13 18:53:10 -03:00
Christophe Gisquet
74c414202f x86: simple_idct10_template: use const
This avoid going through constants.c while still sharing them
with proresdsp.asm

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 22:52:33 +02:00
Nedeljko Babic
de262d018d avcodec/mips/aaccoder_mips: Sync with the generic code
This patch fixes build of AAC encoder optimized for mips that was broken due
 to some changes in generic code that were not propagated to the optimized code.

Also, some functions in the optimized code are basically duplicate of functions
 from generic code. Since they do not bring enough improvement to the optimized
 code to justify their existence, they are removed (which improves
 maintainability of the optimized code).

Optimizations disabled in 97437bd are enabled again.

Signed-off-by: Nedeljko Babic <nedeljko.babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 17:22:56 +02:00
Ronald S. Bultje
e578638382 vp9: use registers for constant loading where possible. 2015-10-13 11:06:01 -04:00
Ronald S. Bultje
408bb8556f vp9: refactor itx coefficients and share between 8 and 10/12bpp. 2015-10-13 11:06:01 -04:00
Ronald S. Bultje
eb4b5ff738 vp9: add itxfm_add eob shortcuts to 10/12bpp functions.
These aren't quite as helpful as the ones in 8bpp, since over there,
we can use pmulhrsw, but here the coefficients have too many bits to
be able to take advantage of pmulhrsw. However, we can still skip
cols for which all coefs are 0, and instead just zero the input data
for the row itx. This helps a few % on overall decoding speed.
2015-10-13 11:06:01 -04:00
Ronald S. Bultje
488fadebbc vp9: add 10/12bpp idct_idct_32x32 sse2 SIMD version. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
3d0ca2fe89 vp9: 10/12bpp sse2 SIMD for iadst16. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
0e80265b0a vp9: refactor 10/12bpp dc-only code in 4x4/8x8 and add to 16x16. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
1338fb79d4 vp9: add 10/12bpp sse2 SIMD version for idct_idct_16x16. 2015-10-13 11:06:00 -04:00
Ronald S. Bultje
cb054d061a vp9: add 10/12bpp sse2 SIMD versions of iadst8x8. 2015-10-13 11:05:59 -04:00
Ronald S. Bultje
e0610787b2 vp9: add 10/12bpp sse2 SIMD for idct_idct_8x8. 2015-10-13 11:05:59 -04:00
Ronald S. Bultje
a35f6bdb38 vp9: add 12bpp sse2 versions of iadst4. 2015-10-13 11:05:59 -04:00
Ronald S. Bultje
235e76aeb8 vp9: initial attempt at a idct_idct_4x4 12bpp x86 simd (sse2) impl.
The trouble with this function is that intermediates overflow 31+sign
bits, so I've added some helpers (that will also be used in 10/12bpp
8x8, 16x16 and 32x32) to make that easier, basically emulating a half-
assed pmaddqd using 2xpmaddwd. It's currently sse2-only, if anyone sees
potential in adding ssse3, I'd love to hear it.
2015-10-13 11:05:58 -04:00
Ronald S. Bultje
f76423d097 vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions. 2015-10-13 11:05:58 -04:00
Ronald S. Bultje
6b579cf547 vp9: add 10bpp simd (mmxext/ssse3) for idct_idct_4x4. 2015-10-13 11:05:58 -04:00
Ronald S. Bultje
1c3be32533 vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function. 2015-10-13 11:05:57 -04:00
Christophe Gisquet
b6594a9605 x86: dct-test: add more idcts
In particular for 10 and 12 bits.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 16:03:04 +02:00
Michael Niedermayer
a745d1a9e4 avcodec/dct-test: Print failure notice below the failed *dct
This makes it easier to see where a failure happens

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 16:03:03 +02:00
Christophe Gisquet
7ece8b50b1 x86: simple_idct: 12bits versions
On 12 frames of a 444p 12 bits DNxHR sequence, _put function:
C:         78902 decicycles in idct,  262071 runs,     73 skips
avx:       32478 decicycles in idct,  262045 runs,     99 skips

Difference between the 2:
stddev:    0.39 PSNR:104.47 MAXDIFF:    2

This is unavoidable and due to the scale factors used in the x86
version, which cannot match the C ones.

In addition, the trick of adding an initial bias to the input of a
pass can overflow, as the input coefficients are already 15bits,
which is the maximum this function can handle.

Overall, however, the omse on 12 bits samples goes from 0.16916 to
0.16883. Reducing rowshift by 1 improves to 0.0908, but causes
overflows.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 15:34:32 +02:00
Derek Buitenhuis
17e41cf361 avcodec: Do not lock during init if there is no init function
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-10-13 13:43:29 +02:00
Christophe Gisquet
4369b9dc7b x86: simple_idct(_put): 10bits versions
Modeled from the prores version. Clips to [0;1023] and is bitexact.
Bitexactness requires to add offsets in different places compared to
prores or C, and makes the function approximately 2% slower.

For 16 frames of a DNxHD 4:2:2 10bits test sequence:

C:    60861 decicycles in idct, 1048205 runs,    371 skips
sse2: 27567 decicycles in idct, 1048216 runs,    360 skips
avx:  26272 decicycles in idct, 1048171 runs,    405 skips

The add version is not implemented, so the corresponding dsp
function is set to NULL to make it clear in a code executing it.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 13:32:21 +02:00
Christophe Gisquet
e652f69b35 x86: simple_idct10_template: fix overflow in pass
When the input of a pass has 15 or 16 bits of precision (in particular
the column pass), the addition of a bias to W4 may lead to overflows
in the input to pmaddwd.

This requires postponing the adding of the bias to after the first
butterfly. To do so, the fact that m15, unused although zeroed, is
exploited. In case the pass is safe, an address can be directly used,
and the number of xmm regs can be decreased. Otherwise, the 32bits bias
is loaded into it.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 12:51:10 +02:00
Christophe Gisquet
2fd14dd8eb avcodec/simple_idct10: improve precision
omse goes from 0.03060703 (which fails for dct-test) to 0.01663750.
This also actually improve the error of decoding the sample generated
by fate-vsynth3-dnxhd1080i-10bit using simple_idct10 to FAANI, which
goes (when resampled to yuv422p) from:
stddev:    0.06 PSNR: 72.28 MAXDIFF:    1
to identical.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 02:10:51 +02:00
Christophe Gisquet
e9a68b0316 x86: prores: templatize 10 bits simple_idct
This should be reused for a generic simple_idct10 function.
Requires a bit of trickery to declare common constants in C.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13 01:10:34 +02:00
Rostislav Pehlivanov
93e6b23c9f aacenc: shorten name of ff_aac_adjust_common_prediction
To keep it similar to the other functions which are all named *_pred.
2015-10-12 23:33:07 +01:00