This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
AS libavcodec/arm/ac3dsp_armv6.o
ffmpeg-src/libavcodec/arm/ac3dsp_armv6.S: Assembler messages:
ffmpeg-src/libavcodec/arm/ac3dsp_armv6.S:40: Error: selected processor
does not support `movw r8,#0x1fe0'
make[1]: *** [libavcodec/arm/ac3dsp_armv6.o] Error 1
MOVW is ARMv7 way to load constant:
* movw, or move wide, will move a 16-bit constant into a register,
implicitly zeroing the top 16 bits of the target register.
* movt, or move top, will move a 16-bit constant into the top half
of a given register without altering the bottom 16 bits
To load 32 bit constant, movw lower16; movt upper16; is better than
ldr if available, because:
While this approach takes two instructions, it does not require any
extra space to store the constant so both the movw/movt method and the
ldr method will end up using the same amount of memory. Memory
bandwidth is precious in and the movw/movt approach avoids an extra
read on the data side, not to mention the read could have missed the
cache.
But here it is armv6 optimization, so that we have to use ldr.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
psymodel: extend API to include PE and bit allocation.
avio: always compile dyn_buf functions
Remove unnecessary parameter from ff_thread_init() and fix behavior
Revert "aac_latm_dec: use aac context and aac m4ac"
configure: tell user if libva is enabled like the rest of external libs.
Add silence support for AV_SAMPLE_FMT_U8.
avio: make URL_PROTOCOL_FLAG_NESTED_SCHEME internal
avio: deprecate av_url_read_seek
avio: deprecate av_url_read_pause
ac3enc: NEON optimised extract_exponents
Conflicts:
libavcodec/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
fate: fix partial run when no samples path is specified
ARM: NEON fixed-point forward MDCT
ARM: NEON fixed-point FFT
lavf: bump minor version and add an APIChanges entry for avio changes
avio: simplify url_open_dyn_buf_internal by using avio_alloc_context()
avio: make url_fdopen internal.
avio: make url_open_dyn_packet_buf internal.
avio: avio_ prefix for url_close_dyn_buf
avio: avio_ prefix for url_open_dyn_buf
avio: introduce an AVIOContext.seekable field
ac3enc: use generic fixed-point mdct
lavfi: add fade filter
Change yadif to not use out of picture lines.
lavc: deprecate AVCodecContext.antialias_algo
lavc: mark mb_qmin/mb_qmax for removal on next major bump.
Conflicts:
doc/filters.texi
libavcodec/ac3enc_fixed.h
libavcodec/ac3enc_float.h
libavfilter/Makefile
libavfilter/allfilters.c
libavfilter/vf_fade.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
ac3enc: ARM optimised ac3_compute_matissa_size
ac3: armv6 optimised bit_alloc_calc_bap
fate: simplify fft test rules
avio: document avio_alloc_context.
lavf: make compute_chapters_end less picky.
sierravmd: fix Indeo3 videos
FFT: simplify fft8()
fate: add fixed-point fft/mdct tests
Fixed-point support in fft-test
ape: check that number of seektable entries is equal to number of frames
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* newdev/master:
ac3enc: move compute_mantissa_size() to ac3dsp
ac3enc: move mant*_cnt and qmant*_ptr out of AC3EncodeContext
Remove support for stripping executables
ac3enc: NEON optimised float_to_fixed24
ac3: move ff_ac3_bit_alloc_calc_bap to ac3dsp
dfa: protect pointer range checks against overflows.
Duplicate: mimic: implement multithreading.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* newdev/master:
matroskadec: set default duration for simple block
When building for MinGW32 disable strict ANSI compliancy.
ARM: fix ff_apply_window_int16_neon() prototype
configure: check for --as-needed support early
ARM: NEON optimised apply_window_int16()
ac3enc: NEON optimised shift functions
ac3enc: NEON optimised ac3_max_msb_abs_int16 and ac3_exponent_min
mpeg12.c: fix slice threading for mpeg2 field picture mode.
ffmetadec.c: fix compiler warnings.
configure: Don't explicitly disable ffplay or in/outdevices on dos
configure: Remove the explicit disabling of ffserver
configure: Add fork as a dependency to ffserver
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The assembler emits literal pools too far from the load instructions,
so we must do it explicitly at a suitable location.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 8b454c352f49c2a61db37793d838b553db3da734)
The assembler emits literal pools too far from the load instructions,
so we must do it explicitly at a suitable location.
Signed-off-by: Mans Rullgard <mans@mansr.com>
6% faster SSE FFT on Conroe, 2.5% on Penryn.
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
(cherry picked from commit e6b1ed693ae4098e6b9eabf938fc31ec0b09b120)
Approximately 5% faster on Cortex-A8.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit a7878c9f73c12cfa685bd8af8f3afcca85f56a8b)
Approximately 3% faster on Cortex-A8.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 7da48fd0111adf504cfcfc5ebda7fd0681968041)
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit a1c1d3c003b0ec16fdb6574913781313fb2c7ab6)
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit c73d99e672329c8f2df290736ffc474c360ac4ae)
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This moves the fields needed by asm near the top, before any
structs or other members which complicate the offset calculation.
Modifying other structs will no longer require updating the offsets,
and the asm code is slightly simpler due to the smaller offsets.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit d461a4731781e492d83ef254f9c0fbd0ce6e47eb)
This moves the fields needed by asm near the top, before any
structs or other members which complicate the offset calculation.
Modifying other structs will no longer require updating the offsets,
and the asm code is slightly simpler due to the smaller offsets.
Signed-off-by: Mans Rullgard <mans@mansr.com>