The function is supposed to confirm that the compiler provided enough
alignment, but in practice it is only run in certain code paths and
insufficient alignment problems are restricted to legacy compilers.
This reverts commit 691dec6201.
The commit did not fix ticket #3215, it was fixed one commit earlier.
The revert may break other use-cases but they should be fixed differently,
the offending commit introduced too many problems.
Fixes ticket #3377.
Fixes ticket #3378.
* cehoyos/master:
Define ff_log2_run[] in libavcodec/internal.h.
Replace an incorrect av_free() in movenc.c with av_freep().
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ed06e5d92b4c67b49068d538461fbbe0a53a8c5e':
hevc: Do not turn 32bit timebases into negative numbers
Conflicts:
libavcodec/hevc.c
See: bf2ce19e51
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* cus/stable:
libzvbi-teletextdec: split dvb packet to slices
libzvbi-teletextdec: use av_dlog where possible
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Based on the aarch64 asm. CPU cycle counts on cortex-a9 compared to
gcc 4.8.2:
before: 475 decicycles in get_cabac_noinline, 67106035 runs, 2829 skips
after: 393 decicycles in get_cabac_noinline, 67106474 runs, 2390 skips
Overall speedup is above 2%. Code generated by clang 3.4 is slower on
the same hardware and the relative change is a little larger.
Based on the x86 branchless get_cabac asm. get_cabac_noinline() gets
approximately 20% faster (no cycle counts available) compared to clang
from Xcode 5.1 beta5. More than 6% faster overall. A part of the overall
speedup might be explained by additional inlining of get_cabac().
Instead of using the demux function of libzvbi to split the packet to slices
(vbi lines), lets do it ourselves.
- eliminates the 1 frame delay between page input and output
- handles non-ascending line numbers more gracefully
- enables us to return error codes on some invalid packets instead of silently
ignoring them
Signed-off-by: Marton Balint <cus@passwd.hu>
The overread avoidance fix in cbddee1cca
broke the computation for the last row since it prevented the safe
reading from the height+1-th row.
CC: libav-stable@libav.org
Old Intel GPUs expect the reference frame index to the actual surface,
instead of the index into RefFrameList as specified by the spec.
This workaround should be set when using one of the "ClearVideo" decoder
devices.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The latest H.264 DXVA specification states that the index in this
structure should refer to a valid entry in the RefFrameList of the picture
parameter structure, and not to the actual surface index.
Fixes H.264 DXVA2 decoding on recent Intel GPUs (tested on Sandy and Ivy)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'b66382101cff33e2ce66500327a90d0a105eedeb':
dxva2: Increase maximum number of slices for mpeg2
See: bceeccc648
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'd48430c367947a64647c6959cf472f2c01778b17':
build: Let the SVQ3 decoder depend on the H.264 decoder
Conflicts:
libavcodec/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Some content requires an higher number of slices in order to
render properly.
Rise the number to 1024 and warn if ever it exceeds.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Also undo the changes to ra144enc.c from previous commits.
Should fix ticket #3429
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The SVQ3 decoder reuses large parts of the H.264 decoder so it
makes no sense to enable the former but not the latter.
Also drop unnecessary h263.o object from SVQ3 decoder object list.
* commit '3741aa37c2a0d0717faff74a5c4cc357d16f6d1d':
x86: cabac: Use correct #includes to make header compile standalone
Merged-by: Michael Niedermayer <michaelni@gmx.at>
c3390fd56c made use of the DSP function
but did not complement it with a call to emms, which is done here before
computations involving floats are performed.
Fixes ticket #3429, which affected MMX/MMXExt machines.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'eeaf4f3b87815cbae4c12856cfaafb3a2dae8e0c':
av_vdpau_get_profile: mask out H.264 intra profile flag
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Should fix compilation failures with --disable-yasm on some compilers
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5397386effba2e53e4ff82852a86f6be4d59e9c1':
mathops: move macro to the only place it is used
Merged-by: Michael Niedermayer <michaelni@gmx.at>
With cli usage the decoder might have not set the colorspace during
encoder init, manual colorspace override might be needed in such
cases.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This applies commit 5de64bb3 (the source of the above commit message)
to libutvideoenc as well.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes out of array read
Fixes: 387713a12dc5cfa27fcb4178084ce1ea-asan_stack-oob_131176a_1182_cov_3861068719_CAINIT_C_SHARP_3.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes out of array accesses
This should not affect any release
Fixes: 8ab69af9e5a7a7e20fe04cdd25c0d6e7-asan_heap-oob_e72b82_5505_cov_2278389485_g2m4.wmv
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This reverts the changes 6467209836
and 68c3ed936a did to the SSE2 version,
which generated a hit of about 5 cycles.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Sandy Bridge Win64:
180 cycles on ff_synth_filter_inner_sse2
150 cycles on ff_synth_filter_inner_avx
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Avoid a division by 0 in ff_mpeg4_set_one_direct_mv.
Sample-Id: 00000168-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* commit 'd1f9563d502037239185c11578cc614bdf0c5870':
pthread_frame: flush all threads on flush, not just the first one
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '2f02bbcca050936686482453078e83dc25493da0':
build: Let the ffvhuff decoder/encoder depend on the huffyuv decoder/encoder
Conflicts:
configure
libavcodec/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '34150be515cd9c43b0b679806b8d01774960af78':
build: Let the iac decoder depend on the imc decoder
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '8e0cf39faf02536dca08f4fe628a66d1ae022fde':
build: Let all MJPEG-related decoders depend on the MJPEG decoder
Conflicts:
configure
libavcodec/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '0a36988e48dd581d29e77f768f987738bdf365f0':
build: Let AMV decoder depend on the SP5X decoder
Conflicts:
configure
libavcodec/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '17a63ff0cd187b9e50e4a47862750295976853b1':
h264: update flag name in ff_h264_decode_ref_pic_list_reordering()
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e70ab7c1f5005041bba0e4efc1165410f83495b2':
h264: add MVCD to the list of High profiles in SPS
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The new function has the ability to allocate the structure, allowing it to grow
without needing major bumps
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This structure is used in the interface between libs and thus cannot have
fields added in the middle without major bump
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
avcodec_flush_buffers() must release all internally held references
according to its documentation, for which all the threads need to be
flushed.
CC:libav-stable@libav.org
Bug-Id: vlc/9665
These codecs compile all of the MJPEG code anyway, so there is little
point in not enabling the MJPEG decoder directly. This also simplifies
the dependency declarations for the MJPEG codec family.
This codec compiles all of the SP5X code anyway, so there is little
point in not enabling the decoder directly. This also simplifies the
dependency declaration for the AMV decoder.
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '92e598a57a7ce4b8ac9ea56274af39f5fd888311':
prores: Drop DSP infrastructure for prores encoder bits
Conflicts:
libavcodec/Makefile
libavcodec/proresdsp.c
libavcodec/proresenc_kostya.c
Note, these changes only affect one of the 2 prores encoders we have
If someone wants to add optimizations to the affected encoder, or needs/wants
this infrastructure, then iam happy to revert this
Merged-by: Michael Niedermayer <michaelni@gmx.at>
AAC LOAS can have new audio config objects in the stream itself.
Make sure the decoder reconfigures itself when the first one arrives
midstream.
Bug-Id: 644
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.
Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.
The decode_hf implementations have following timings:
For x86 Arrandale:
C SSE SSE2 SSE4
win32: 260 162 119 104
win64: 242 N/A 89 72
The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
Based on a patch from Christophe Gisquet.
Unrolling of the m == 0 case avoids a possible use of the uninitilized
value sum when s->predictor_history is not set. I failed to find a
sample for it. It also reduced the cycle count from 220 to 150 on
sandy bridge, x86_64 linux, gcc 4.8.2 compared to his patch.
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The scaling factor is constant so it is faster to scale the
FIR coefficients in the tables during compilation.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>