I found the optimisation of 2 samples per iteration obscured the
underlying algorithm. I had to write it out on paper and translate into
a mathematical sum to see that the two samples are unconnected. I hope
that if anyone else is struggling to understand the code that this will
be useful.
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavfi doxy: add buffer{src,sink}.h to the main lavfi doxy group
Conflicts:
libavfilter/buffersink.h
libavfilter/buffersrc.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f758ea6e99af6ebd24bbe222898a921c222e5593':
buffersink: document special error codes returned from av_buffersink_get_frame
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '3fbad00714698f59c6326edfcc63db87f525e7c0':
utvideoenc: Enable support for multiple slices and use them
Conflicts:
libavcodec/utvideoenc.c
tests/fate/utvideo.mak
See: efec857c9f70113bdbcc18e03a5bcadcdca9f9a1
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The official Ut Video decoder only threads with slices, thus until
now any files encoded by the libavcodec encoder have only been
decodable with a single thread. The default slice count is now
set to subsampled_height / 120.
Also sets slices to 1 for the Ut Video encoder tests to keep them
green.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Altivec can only load naturally aligned vectors. To handle possibly
unaligned data a second vector is loaded from an offset of the original
location and the data is recovered through a vector permutation.
Overreads are minimal if the offset for second load points to the last
element of data. This is 7 for loading eight 8-bit pixels and overreads
are reduced from 16 bytes to 8 bytes if the pixels are 64-bit aligned.
For unaligned pixels the overread is reduced from 23 bytes to 15 bytes
in the worst case.
The official Ut Video decoder only threads with slices, thus until
now any files encoded by the libavcodec encoder have only been
decodable with a single thread. The default slice count is now
set to subsampled_height / 120.
Also sets slices to 1 for the Ut Video encoder tests to keep them
green.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '19d3127867f001d007f98bc8c5a85c5409abf788':
doxygen: Set EXAMPLE_PATH from within doxy-wrapper.sh
Conflicts:
doc/Doxyfile
doc/doxy-wrapper.sh
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'f1f42cfc66804907d1df9231469e4296472bb0f5':
build: Do not pass HTML snippets and stylesheet as input to Doxygen
Conflicts:
doc/Makefile
See: 0f378d86321e4d14153a28d5e74c3ff0f99b1a20
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e6c175dfd51e4b0e6deeae72cd8a161b22af3492':
Doxyfile: Only set HTML_{HEADER|FOOTER|STYLESHEET} from doxy_wrapper.sh
Conflicts:
doc/Doxyfile
See: 7d0ca5b7e43676cc23834ccd19d40744f7328b77
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '304e916a92bc17385a485bec2f957e192257ddb6':
h264_sei: name buffering period type consistently
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '3a0576702825423abecb32627c530dbc4c0f73bc':
h264: store current_sps_id inside the current sps
Conflicts:
libavcodec/h264.c
libavcodec/h264_ps.c
The current_sps_id is not removed as it used in security related code.
Merged-by: Michael Niedermayer <michaelni@gmx.at>