ffmpeg/libavutil
Ganesh Ajjanagadde 68e79b27a5 avutil/lls: speed up performance of solve_lls
This is a trivial rewrite of the loops that results in better
prefetching and associated cache efficiency. Essentially, the problem is
that modern prefetching logic is based on finite state Markov memory, a reasonable
assumption that is used elsewhere in CPU's in for instance branch
predictors.

Surrounding loops all iterate forward through the array, making the
predictor think of prefetching in the forward direction, but the
intermediate loop is unnecessarily in the backward direction.

Speedup is nontrivial. Benchmarks obtained by 10^6 iterations within
solve_lls, with START/STOP_TIMER. File is tests/data/fate/flac-16-lpc-cholesky.err.
Hardware: x86-64, Haswell, GNU/Linux.

new:
  17291 decicycles in solve_lls, 2096706 runs,    446 skips
  17255 decicycles in solve_lls, 4193657 runs,    647 skips
  17231 decicycles in solve_lls, 8384997 runs,   3611 skips
  17189 decicycles in solve_lls,16771010 runs,   6206 skips
  17132 decicycles in solve_lls,33544757 runs,   9675 skips
  17092 decicycles in solve_lls,67092404 runs,  16460 skips
  17058 decicycles in solve_lls,134188213 runs,  29515 skips

old:
  18009 decicycles in solve_lls, 2096665 runs,    487 skips
  17805 decicycles in solve_lls, 4193320 runs,    984 skips
  17779 decicycles in solve_lls, 8386855 runs,   1753 skips
  18289 decicycles in solve_lls,16774280 runs,   2936 skips
  18158 decicycles in solve_lls,33548104 runs,   6328 skips
  18420 decicycles in solve_lls,67091793 runs,  17071 skips
  18310 decicycles in solve_lls,134187219 runs,  30509 skips

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-11-26 09:20:46 -05:00
..
aarch64 Merge commit '780cd20b00a69e26bbfffbb8eec16fbe999ea793' 2014-12-09 12:08:29 +01:00
arm avutil/attributes: add AV_GCC_VERSION_AT_MOST 2015-09-18 12:41:29 -03:00
avr32
bfin Merge commit '880e2aa23645ed9871c66ee1cbd00f93c72d2d73' 2014-06-02 19:38:01 +02:00
mips mips: intreadwrite: Only execute that code for mips r1 or r2 2015-09-29 11:10:37 +02:00
ppc avutil/ppc/cpu: add include avassert.h 2015-06-05 19:12:58 +02:00
sh4
tomi
x86 avutil/x86/bswap: Remove warning about bswap intrinsics with msvc. 2015-11-23 23:03:32 +11:00
adler32.c avutil/adler32: Fix data type in test code 2015-06-19 02:25:48 +02:00
adler32.h adler32: Fix doxy group definition 2014-04-07 01:31:02 +02:00
aes_internal.h lavu/aes: align AVAES struct members 2015-10-28 04:23:14 -05:00
aes.c lavu/aes: test CBC functionality 2015-10-28 09:38:21 -05:00
aes.h lavu: Drop deprecated context size variables 2015-08-28 16:04:27 +02:00
atomic_gcc.h lavu/atomic: add support for the new memory model aware gcc built-ins 2014-10-29 14:09:58 -03:00
atomic_suncc.h
atomic_win32.h msvc: Fix compilation errors due to header include order. 2014-11-27 12:40:18 +01:00
atomic.c avutil/atomic: reuse ret to avoid dereferencing twice the same value. 2014-12-27 22:14:23 +01:00
atomic.h Merge remote-tracking branch 'qatar/master' 2013-12-20 13:16:56 +01:00
attributes.h avutil/attributes: add av_warn_unused_result 2015-10-05 19:30:20 +02:00
audio_fifo.c avfilter: add showfreqs filter 2015-08-19 16:15:13 +00:00
audio_fifo.h avutil/audio_fifo: add av_warn_unused_result 2015-10-28 23:05:31 -04:00
avassert.h
avstring.c avutil/avstring: Inline some tiny functions 2015-10-03 13:45:37 +02:00
avstring.h avutil/avstring: add av_warn_unused_result 2015-10-27 23:16:09 -04:00
avutil.h doxygen: Remove lavu_internal group 2015-08-22 10:07:05 -07:00
avutilres.rc Add Windows resource file support for shared libraries 2013-12-05 23:42:07 +01:00
base64.c Merge commit 'fb0c9d41d685abb58575c5482ca33b8cd457c5ec' 2014-01-26 01:54:55 +01:00
base64.h
blowfish.c avutil: undo FF_API_CRYPTO_CONTEXT deprecation 2015-10-16 19:13:38 -03:00
blowfish.h avutil: undo FF_API_CRYPTO_CONTEXT deprecation 2015-10-16 19:13:38 -03:00
bprint.c avutil & avdevice: remove av_bprint_fd_contents() 2014-07-15 21:49:56 +02:00
bprint.h avutil/bprint: C++ compatible AVBPrint definition. 2014-11-29 03:51:35 +01:00
bswap.h Fix compile error on bfin. 2014-08-05 01:54:47 +02:00
buffer_internal.h Merge commit 'fbd6c97f9ca858140df16dd07200ea0d4bdc1a83' 2014-11-27 23:42:16 +01:00
buffer.c avutil/buffer: Avoid moving the AVBufferRef to a new place in memory in av_buffer_make_writable() 2015-03-12 02:15:28 +01:00
buffer.h Revert "lavu/buffer: add release function" 2014-03-06 03:23:40 +01:00
camellia.c avutil: use EINVAL instead of -1 for the return code of crypto related init functions 2015-10-18 15:17:58 -04:00
camellia.h avutil/camellia: fix documentation for av_camellia_crypt() 2015-01-02 21:23:45 +01:00
cast5.c avutil: use EINVAL instead of -1 for the return code of crypto related init functions 2015-10-18 15:17:58 -04:00
cast5.h avutil/cast5: update Doxygen for av_cast5_init with return information 2015-10-15 22:32:58 -04:00
channel_layout.c libavutil/channel_layout: Check strtol*() for failure 2015-11-05 19:28:19 +01:00
channel_layout.h Merge commit 'e23f84d9652474353d8bbc42787a56ec1991908f' 2015-08-24 10:40:24 +02:00
color_utils.c avutil/color_utils: Add basic transfer functions for each AVColorTransferCharacteristic 2015-09-10 23:53:05 +02:00
color_utils.h avutil/color_utils: Add basic transfer functions for each AVColorTransferCharacteristic 2015-09-10 23:53:05 +02:00
colorspace.h avutil/colorspace: Remove RGB_TO_Y/U/V 2015-06-06 18:21:01 +02:00
common.h avutil: Move av_rint64_clip_* to internal.h 2015-11-15 03:47:09 +01:00
cpu_internal.h Merge commit 'cae39851201b7781f1262e1c23627b45e6e80bb4' 2015-05-31 23:59:48 +02:00
cpu.c lavu: add AESNI CPU flag 2015-10-28 04:23:14 -05:00
cpu.h lavu: add AESNI CPU flag 2015-10-28 04:23:14 -05:00
crc.c avutil/crc: use EINVAL instead of -1 for the return code of av_crc_init() 2015-10-16 03:24:36 +02:00
crc.h Merge commit '0983d48111f578e17e8c1967d25ce593fce62b63' 2014-04-17 22:38:51 +02:00
des.c avutil: use EINVAL instead of -1 for the return code of crypto related init functions 2015-10-18 15:17:58 -04:00
des.h avutil: undo FF_API_CRYPTO_CONTEXT deprecation 2015-10-16 19:13:38 -03:00
dict.c Merge commit '11c5f438ff83da5040e85bfa6299f56b321d32ef' 2015-10-14 14:01:11 +02:00
dict.h Merge commit '11c5f438ff83da5040e85bfa6299f56b321d32ef' 2015-10-14 14:01:11 +02:00
display.c Merge commit 'e4fe535d12f4f30df2dd672e30304af112a5a827' 2015-03-24 01:14:31 +01:00
display.h Merge commit 'e4fe535d12f4f30df2dd672e30304af112a5a827' 2015-03-24 01:14:31 +01:00
downmix_info.c Merge commit 'c98f3169bfb578c1a4e407b44524f0bfa3b4dc0c' 2014-02-16 02:05:29 +01:00
downmix_info.h fix spelling errors 2014-07-12 22:33:27 +02:00
dynarray.h fix spelling errors 2014-07-12 22:33:27 +02:00
error.c avutil/error: list most common error code in error_entries when strerror_r() is unavailable 2015-02-10 23:02:24 +01:00
error.h avutil/error: Introduce new error codes for 4XX and 5XX replies from remote servers 2014-10-19 22:32:14 +02:00
eval.c avutil/eval: change sqrt to hypot 2015-11-21 08:51:49 -05:00
eval.h avutil/eval: minor typo 2015-11-01 19:35:01 -05:00
fifo.c avutil/fifo: add function av_fifo_generic_peek_at() 2015-10-14 20:23:58 +02:00
fifo.h avutil/fifo: add function av_fifo_generic_peek_at() 2015-10-14 20:23:58 +02:00
file_open.c avutil/file_open: avoid file handle inheritance on Windows 2015-11-02 17:40:49 +01:00
file.c Merge commit 'bf704132a51f5d838365158331d4e535e1df4c8e' 2015-02-14 21:27:44 +01:00
file.h avutil/file: add av_warn_unused_result to av_file_map 2015-10-16 17:18:39 -04:00
fixed_dsp.c avutil/fixed_dsp: remove ff_ prefix from static function 2015-06-20 03:39:09 -03:00
fixed_dsp.h libavutil: Add new fixed dsp functions. 2015-06-03 22:50:53 +02:00
float_dsp.c avutil: merge avpriv_float_dsp_init into avpriv_float_dsp_alloc 2015-10-21 00:24:58 +02:00
float_dsp.h avutil: merge avpriv_float_dsp_init into avpriv_float_dsp_alloc 2015-10-21 00:24:58 +02:00
frame.c Merge commit '1aa24df74c052a73175c43e57d35b4835e537ec8' 2015-10-03 09:52:39 +02:00
frame.h Merge commit '1aa24df74c052a73175c43e57d35b4835e537ec8' 2015-10-03 09:52:39 +02:00
hash.c lavu/hash.c: Add missing "static const". 2014-08-31 10:33:02 +02:00
hash.h lavu/hash: add hash_final helpers. 2014-04-29 13:24:11 +02:00
hmac.c lavu/hmac: remove deprecated type ids 2015-09-05 18:07:20 +02:00
hmac.h lavu/hmac: remove deprecated type ids 2015-09-05 18:07:20 +02:00
imgutils.c imgutils: Use designated initializers for AVClass 2015-11-23 18:30:25 -08:00
imgutils.h Replace a few leftover instances of enum PixelFormat with enum AVPixelFormat 2015-03-17 23:53:33 +02:00
integer.c
integer.h
internal.h avutil: Move av_rint64_clip_* to internal.h 2015-11-15 03:47:09 +01:00
intfloat.h Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
intmath.c intmath: remove av_ctz. 2015-10-11 18:03:10 -04:00
intmath.h avutil/intmath: fix undefined behavior in ff_ctzll_c() 2015-10-22 14:10:42 +02:00
intreadwrite.h libavutil: document side effects of macros 2014-07-19 14:55:46 +02:00
lfg.c
lfg.h
libavutil.v lavu: stop exporting internal functions 2014-08-12 04:35:52 +02:00
libm.h avutil/libm: fix isnan compatibility hack 2015-11-24 21:33:13 -05:00
lls.c avutil/lls: speed up performance of solve_lls 2015-11-26 09:20:46 -05:00
lls.h lavu: Drop deprecated private lls functions 2015-08-28 16:04:27 +02:00
log2_tab.c
log.c avutil/log: fix zero length gnu_printf format string warning 2015-09-17 18:58:01 +02:00
log.h avutil/log: modify AV_LOG_MAX_OFFSET for AV_LOG_TRACE 2015-06-26 14:02:35 +02:00
lzo.c avutil/lzo: fix resource leak 2014-10-11 12:15:26 +02:00
lzo.h
macros.h Merge remote-tracking branch 'qatar/master' 2013-12-30 11:23:32 +01:00
Makefile avutil: install des.h, rc4.h and tree.h as public headers 2015-10-21 00:23:10 +02:00
mathematics.c avutil/mathematics: make av_gcd more robust 2015-10-29 19:13:55 -04:00
mathematics.h avutil/mathematics: correct documentation for av_gcd 2015-10-30 13:42:04 -04:00
md5.c lavu: Drop deprecated context size variables 2015-08-28 16:04:27 +02:00
md5.h lavu: Drop deprecated context size variables 2015-08-28 16:04:27 +02:00
mem_internal.h avutil/mem_internal: add missing header includes 2015-07-13 21:54:15 -03:00
mem.c avutil/mem: Add av_fast_mallocz() 2015-11-18 22:05:16 +01:00
mem.h avutil/mem: Add av_fast_mallocz() 2015-11-18 22:05:16 +01:00
motion_vector.h avutil/motion_vector: export subpel motion information 2015-11-23 10:55:15 +01:00
murmur3.c avutil/murmur3: Add () to protect the ROT() arguments 2015-02-17 00:18:15 +01:00
murmur3.h Add 128 bit murmur3 hash function. 2013-05-13 21:42:37 +02:00
opencl_internal.c lavu: rename ff_opencl_set_parameter() to avpriv_opencl_set_parameter() 2014-08-12 03:49:45 +02:00
opencl_internal.h avutil/opencl_internal: add av_warn_unused_result 2015-10-31 10:40:54 -04:00
opencl.c opencl: Use "opencl" as log context name 2015-10-17 01:16:50 -07:00
opencl.h opencl: Force the use of 1.2 APIs 2015-10-17 01:16:50 -07:00
opt.c lavu/opt: enhance printing durations. 2015-11-07 16:04:09 +01:00
opt.h lavu/opt: add flag to return NULL when applicable in av_opt_get 2015-10-09 04:12:57 -05:00
parseutils.c Merge commit '219b39a71a5694b1c14a07b86477f665a5b6849b' 2015-07-21 16:55:39 +02:00
parseutils.h Merge commit '27f274628234c1f934b9a6a6380ed567c1b4ceae' 2015-04-07 20:46:25 +02:00
pca.c avutil/pca: Check for av_malloc* failures 2015-03-30 04:37:42 +02:00
pca.h avutil/pca: Make argument of ff_pca_add() const 2014-09-28 16:17:18 +02:00
pixdesc.c pixfmt: Add new SMPTE color primaries and transfer characteristic values 2015-09-17 10:31:43 +02:00
pixdesc.h Merge commit '7b02cb29d9d60cdd5ef321043d11d02023e7dc8f' 2015-09-12 13:03:04 +02:00
pixelutils.c avutil: check pixdescs in a different place 2015-02-10 15:45:02 +01:00
pixelutils.h avutil: add pixelutils API 2014-08-05 21:05:52 +02:00
pixfmt.h pixfmt: Add new SMPTE color primaries and transfer characteristic values 2015-09-17 10:31:43 +02:00
qsort.h avutil/qsort: use the do while form for AV_QSORT, AV_MSORT 2015-10-23 08:41:16 -04:00
random_seed.c msvc: fix implicitly declared read/close. 2014-08-02 14:52:17 +02:00
random_seed.h
rational.c avutil/rational: use frexp rather than ad-hoc log to get floating point exponent 2015-10-30 23:18:43 -04:00
rational.h avutil: Add av_q2intfloat() 2015-05-26 18:31:53 +02:00
rc4.c avutil: use EINVAL instead of -1 for the return code of crypto related init functions 2015-10-18 15:17:58 -04:00
rc4.h avutil: undo FF_API_CRYPTO_CONTEXT deprecation 2015-10-16 19:13:38 -03:00
replaygain.h Merge commit '8542f9c4f17125d483c40c0c5723842f1c982f81' 2014-04-04 22:52:12 +02:00
reverse.c avutil: add ff_reverse as av_reverse replacement 2015-08-12 00:14:14 +02:00
ripemd.c avutil/ripemd: make rol macro more robust by adding parentheses 2015-10-28 21:42:15 -04:00
ripemd.h lavu: Add RIPEMD hashing 2013-06-15 18:54:01 -03:00
samplefmt.c avutil: remove obsolete FF_API_SAMPLES_UTILS_RETURN_ZERO cruft 2014-10-05 17:09:56 -03:00
samplefmt.h avutil: remove obsolete FF_API_GET_BITS_PER_SAMPLE_FMT cruft 2014-10-05 17:09:49 -03:00
sha512.c avutil: use EINVAL instead of -1 for the return code of crypto related init functions 2015-10-18 15:17:58 -04:00
sha512.h lavu: Add SHA-2 512 hashing 2013-06-02 11:27:19 +02:00
sha.c avutil: use EINVAL instead of -1 for the return code of crypto related init functions 2015-10-18 15:17:58 -04:00
sha.h lavu: Drop deprecated context size variables 2015-08-28 16:04:27 +02:00
softfloat_tables.h avutil/softfloat_tables: add missing stdint.h include 2015-04-30 17:38:41 -03:00
softfloat.c avutil/softfloat: Include negative numbers in cmp/gt tests 2015-11-08 15:04:05 +01:00
softfloat.h avutil/softfloat: use abort() instead of av_assert0(0) 2015-11-10 13:37:24 -03:00
stereo3d.c Merge commit '159a06dfc83d189f753c4583583ddfb571552ff5' 2014-08-14 00:17:47 +02:00
stereo3d.h Merge commit '440842c4eb1d7709654ec97cd687663d11ef499c' 2014-06-19 23:47:10 +02:00
tea.c Add support for TEA (Tiny Encryption Algorithm) 2015-07-21 23:10:44 +02:00
tea.h Add support for TEA (Tiny Encryption Algorithm) 2015-07-21 23:10:44 +02:00
thread.h Merge commit 'c53e796f8b69799b7ad6d28fbab981d37edf1bc9' 2015-10-14 23:02:35 +02:00
threadmessage.c lavu: add thread message API. 2014-05-26 11:40:15 +02:00
threadmessage.h lavu: add thread message API. 2014-05-26 11:40:15 +02:00
time_internal.h avutil/time_internal: do not attempt to override *time_r() macros 2014-11-05 18:44:15 +01:00
time.c Merge commit '1bd0bdcdc236099d5c0d179696951f35f5310fa5' 2014-10-24 11:06:56 +02:00
time.h Merge commit '1bd0bdcdc236099d5c0d179696951f35f5310fa5' 2014-10-24 11:06:56 +02:00
timecode.c timecode: Support HFR values 2015-10-26 15:05:26 +01:00
timecode.h
timer.h avutil/timer: give each printed value of STOP_TIMER a fixed length 2015-03-27 04:44:58 +01:00
timestamp.h avutil/timestamp: Warn about missing __STDC_FORMAT_MACROS for C++ use 2014-03-13 17:32:15 +01:00
tree.c avutil/tree: clean up pointer incompatibility warnings 2015-10-25 12:45:10 -04:00
tree.h avutil/tree: Document the guaranteed ordering of compare arguments for av_tree_find() 2015-10-25 17:28:47 +01:00
twofish.c avutil: use EINVAL instead of -1 for the return code of crypto related init functions 2015-10-18 15:17:58 -04:00
twofish.h libavutil: Added twofish symmetric block cipher 2015-01-29 01:56:11 +01:00
utf8.c avutil/utf8: put under #ifdef TEST 2013-11-22 17:16:11 +01:00
utils.c lavu: disable wrong value check in get_version() upon api bump. 2015-08-18 15:57:20 -04:00
version.h avutil/motion_vector: export subpel motion information 2015-11-23 10:55:15 +01:00
wchar_filename.h avutil/wchar_filename: add av_warn_unused_result 2015-10-30 13:47:28 -04:00
x86_cpu.h
xga_font_data.c
xga_font_data.h
xtea.c Merge commit '588b6215b4c74945994eb9636b0699028c069ed2' 2015-11-22 14:29:09 +00:00
xtea.h Merge commit '588b6215b4c74945994eb9636b0699028c069ed2' 2015-11-22 14:29:09 +00:00