Commit Graph

3933 Commits

Author SHA1 Message Date
Ganesh Ajjanagadde
68e79b27a5 avutil/lls: speed up performance of solve_lls
This is a trivial rewrite of the loops that results in better
prefetching and associated cache efficiency. Essentially, the problem is
that modern prefetching logic is based on finite state Markov memory, a reasonable
assumption that is used elsewhere in CPU's in for instance branch
predictors.

Surrounding loops all iterate forward through the array, making the
predictor think of prefetching in the forward direction, but the
intermediate loop is unnecessarily in the backward direction.

Speedup is nontrivial. Benchmarks obtained by 10^6 iterations within
solve_lls, with START/STOP_TIMER. File is tests/data/fate/flac-16-lpc-cholesky.err.
Hardware: x86-64, Haswell, GNU/Linux.

new:
  17291 decicycles in solve_lls, 2096706 runs,    446 skips
  17255 decicycles in solve_lls, 4193657 runs,    647 skips
  17231 decicycles in solve_lls, 8384997 runs,   3611 skips
  17189 decicycles in solve_lls,16771010 runs,   6206 skips
  17132 decicycles in solve_lls,33544757 runs,   9675 skips
  17092 decicycles in solve_lls,67092404 runs,  16460 skips
  17058 decicycles in solve_lls,134188213 runs,  29515 skips

old:
  18009 decicycles in solve_lls, 2096665 runs,    487 skips
  17805 decicycles in solve_lls, 4193320 runs,    984 skips
  17779 decicycles in solve_lls, 8386855 runs,   1753 skips
  18289 decicycles in solve_lls,16774280 runs,   2936 skips
  18158 decicycles in solve_lls,33548104 runs,   6328 skips
  18420 decicycles in solve_lls,67091793 runs,  17071 skips
  18310 decicycles in solve_lls,134187219 runs,  30509 skips

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-11-26 09:20:46 -05:00
Ganesh Ajjanagadde
29af74e4e3 avutil/libm: fix isnan compatibility hack
Commit 14ea4151d7 had a bug in that the
conversion of the uint64_t result to an int (the return signature) would
lead to implementation defined behavior, and in this case simply
returned 0 for NAN. A fix via AND'ing the result with 1 does the trick,
simply by ensuring a 0 or 1 return value.

Patch tested with FATE on x86-64, GNU/Linux by forcing the compatibility
code via an ifdef hack suggested by Michael.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-11-24 21:33:13 -05:00
Timothy Gu
7c91b3021c imgutils: Use designated initializers for AVClass
More readable and less breakable.
2015-11-23 18:30:25 -08:00
Matt Oliver
e9ec28c95e avutil/x86/bswap: Remove warning about bswap intrinsics with msvc.
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2015-11-23 23:03:32 +11:00
Clément Bœsch
56bdf61baa avutil/motion_vector: export subpel motion information
FATE test changes because of the switch from shift to division.
2015-11-23 10:55:15 +01:00
Derek Buitenhuis
e12f403678 Merge commit '588b6215b4c74945994eb9636b0699028c069ed2'
* commit '588b6215b4c74945994eb9636b0699028c069ed2':
  rtmpcrypt: Do the xtea decryption in little endian mode
  xtea: Add functions for little endian mode

  Conflicts:
      libavutil/xtea.c

Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2015-11-22 14:29:09 +00:00
Ganesh Ajjanagadde
401c93ddb7 avutil/eval: change sqrt to hypot
This improves the mathematical behavior of hypotenuse computation.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-11-21 08:51:49 -05:00
Ganesh Ajjanagadde
275aca8fba configure+libm.h: add hypot emulation
It is known that the naive sqrt(x*x + y*y) approach for computing the
hypotenuse suffers from overflow and accuracy issues, see e.g
http://www.johndcook.com/blog/2010/06/02/whats-so-hard-about-finding-a-hypotenuse/.
This adds hypot support to FFmpeg, a C99 function.

On platforms without hypot, this patch does a reaonable workaround, that
although not as accurate as GNU libm, is readable and does not suffer
from the overflow issue. Improvements can be made separately.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-11-21 08:51:49 -05:00
Ganesh Ajjanagadde
14ea4151d7 avutil/libm: correct isnan, isinf compat hacks
isnan and isinf are actually macros as per the standard. In particular,
the existing implementation has incorrect signature. Furthermore, this
results in undefined behavior for e.g double values outside float range
as per the standard.

This patch corrects the undefined behavior for all usage within FFmpeg.

Note that long double is not handled as it is not used in FFmpeg.
Furthermore, even if at some point long double gets used, it is likely
not needed to modify the macro in practice for usage in FFmpeg. See
below for analysis.

Getting long double to work strictly per the spec is significantly harder
since a long double may be an IEEE 128 bit quad (very rare), 80 bit
extended precision value (on GCC/Clang), or simply double (on recent Microsoft).
On the other hand, any potential future usage of long double is likely
for precision (when a platform offers extra precision) and not for range, since
the range anyway varies and is not as portable as IEEE 754 single/double
precision. In such cases, the implicit cast to a double is well defined
and isinf and isnan should work as intended.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-11-21 08:51:49 -05:00
Derek Buitenhuis
3d2363fbf9 Merge commit '1fc94724f1fd52944bb5ae571475c621da4b77a0'
* commit '1fc94724f1fd52944bb5ae571475c621da4b77a0':
  xtea: Clarify that the current API works in big endian mode

Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2015-11-19 14:14:20 +00:00
Michael Niedermayer
fc91eeab0b avutil/mem: Add av_fast_mallocz()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-18 22:05:16 +01:00
Michael Niedermayer
5c3dee7dad avutil: Move av_rint64_clip_* to internal.h
The function is renamed to ff_rint64_clip()

This should avoid build failures on VS2012
Feel free to changes this to a different solution

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-15 03:47:09 +01:00
Ganesh Ajjanagadde
6f520ce1a6 avutil/common: add av_rint64_clip
The rationale for this function is reflected in the documentation for
it, and is copied here:

Clip a double value into the long long amin-amax range.
This function is needed because conversion of floating point to integers when
it does not fit in the integer's representation does not necessarily saturate
correctly (usually converted to a cvttsd2si on x86) which saturates numbers
> INT64_MAX to INT64_MIN. The standard marks such conversions as undefined
behavior, allowing this sort of mathematically bogus conversions. This provides
a safe alternative that is slower obviously but assures safety and better
mathematical behavior.
API:
@param a value to clip
@param amin minimum value of the clip range
@param amax maximum value of the clip range
@return clipped value

Note that a priori if one can guarantee from the calling side that the
double is in range, it is safe to simply do an explicit/implicit cast,
and that will be far faster. However, otherwise this function should be
used.

avutil minor version is bumped.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-11-13 21:48:16 -05:00
Martin Storsjö
92d107a171 xtea: Add functions for little endian mode
Signed-off-by: Martin Storsjö <martin@martin.st>
2015-11-13 21:53:54 +02:00
Martin Storsjö
1fc94724f1 xtea: Clarify that the current API works in big endian mode
Signed-off-by: Martin Storsjö <martin@martin.st>
2015-11-13 21:53:51 +02:00
Matt Oliver
58d32c00be avutil/x86/intmath: Fix intrinsic header include when using newer gcc with older icc.
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2015-11-12 16:54:08 +11:00
Matt Oliver
9b9c9ef3b4 avutil/x86/bswap: Add msvc bswap instrinsics.
This adds msvc optimisations as well as fixing an error in icl whereby it will generate invalid code otherwise.

Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2015-11-12 16:53:44 +11:00
Matt Oliver
9105399060 avutil/x86/intmath: Disable use of tzcnt on older intel compilers.
ICC versions older than atleast 12.1.6 dont have the tzcnt intrinsics.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2015-11-11 10:18:08 +11:00
James Almer
9f4a41bf99 avutil/softfloat: use abort() instead of av_assert0(0)
Fixes compilation of host tool aacps_fixed_tablegen.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-11-10 13:37:24 -03:00
Matt Oliver
f984174512 avutil/x86/intmath: Correct intrinsic headers for older compilers.
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2015-11-09 21:40:33 +11:00
Andreas Cadhalpun
9ac61e73d0 softfloat: handle INT_MIN correctly in av_int2sf
Otherwise v=INT_MIN doesn't get normalized and thus triggers av_assert2
in other functions.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-11-08 21:06:40 +01:00
Andreas Cadhalpun
f3866a14c3 softfloat: assert when the argument of av_sqrt_sf is negative
The correct result can't be expressed in SoftFloat.
Currently it returns a random value from an out of bounds read.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2015-11-08 21:05:21 +01:00
Michael Niedermayer
955cdc43a3 avutil/softfloat: Include negative numbers in cmp/gt tests
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 15:04:05 +01:00
Michael Niedermayer
05b05a7a84 avutil/softfloat: Fix av_gt_sf() with large exponents try #2
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 15:03:28 +01:00
Michael Niedermayer
791ea23e57 avutil/softfloat: Add test for av_gt_sf()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 15:02:05 +01:00
Michael Niedermayer
ecfb076141 avutil/softfloat: Extend the av_cmp_sf() test to cover a wider range of exponents
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 14:55:22 +01:00
Michael Niedermayer
cee3c9d29a avutil/softfloat: Fix overflows in shifts in av_cmp_sf() and av_gt_sf()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 14:54:51 +01:00
Michael Niedermayer
df2a2117d2 avutil/softfloat: Add test for av_cmp_sf()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 14:39:46 +01:00
Michael Niedermayer
596dfe7d6c avutil/softfloat: Add tests for exponent underflows
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 14:23:38 +01:00
Michael Niedermayer
046218b212 avutil/softfloat: Fix exponent underflow in av_div_sf()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 14:23:38 +01:00
Michael Niedermayer
a1e3303fc0 avutil/softfloat: Fix exponent underflow in av_mul_sf()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 14:23:38 +01:00
Michael Niedermayer
4135a2bfd6 avutil/softfloat: Fix typo in av_mul_sf() doxy
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 14:23:37 +01:00
Michael Niedermayer
4b6ad23609 Revert "avutil/softfloat: Check for MIN_EXP in av_sqrt_sf()"
This case should not be possible if the input has a exponent within
the valid range

This reverts commit 0269fb11e3.
2015-11-08 14:23:37 +01:00
Michael Niedermayer
0269fb11e3 avutil/softfloat: Check for MIN_EXP in av_sqrt_sf()
Otherwise the exponent could eventually underflow

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 13:39:06 +01:00
Michael Niedermayer
107db5abf3 avutil/softfloat: Correctly set the exponent for 0.0 in av_sqrt_sf()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 13:39:06 +01:00
Michael Niedermayer
a66b243d52 avutil/softfloat: FLOAT_0 should use MIN_EXP
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-08 03:57:02 +01:00
Nicolas George
47ea04ff10 lavu/opt: enhance printing durations.
Trim unneeded leading components and trailing zeros.
Move the formating code in a separate function.
Use the function also to format the default value, it was currently
printed as plain integer, inconsistent to the way it is parsed.
2015-11-07 16:04:09 +01:00
Michael Niedermayer
c9bfd6a8c3 libavutil/channel_layout: Check strtol*() for failure
Fixes assertion failure
Fixes: 4f5814bb15d2dda6fc18ef9791b13816/signal_sigabrt_7ffff6ae7cc9_65_7209d160d168b76f311be6cd64a548eb.wv

Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-05 19:28:19 +01:00
Ganesh Ajjanagadde
265f83fd35 avutil/common: add FFDIFFSIGN macro
This is of use for defining comparator callbacks. Common approaches like
return x-y are not safe due to the risks of overflow.
Furthermore, the (x > y) - (x < y) trick is optimized to branchless
code.
This also documents this macro accordingly.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-11-03 16:28:12 -05:00
Tobias Rapp
4746653466 avutil/file_open: avoid file handle inheritance on Windows
Avoids inheritance of file handles on Windows systems similar to the
O_CLOEXEC/FD_CLOEXEC flag on Linux.

Fixes file lock issues in Windows applications when a child process
is started with handle inheritance enabled (standard input/output
redirection) while a FFmpeg transcoding is running in the parent
process.

Links relevant to the subject:

https://msdn.microsoft.com/en-us/library/w7sa2b22.aspx

Describes the _wsopen() function and the O_NOINHERIT flag. File handles
opened by _wsopen() are inheritable by default.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms682425%28v=vs.85%29.aspx

Describes handle inheritance when creating new processes. Handle
inheritance must be enabled (bInheritHandles = TRUE) e.g. when you want
to pass handles for stdin/stdout via lpStartupInfo.

Signed-off-by: Tobias Rapp <t.rapp@noa-audio.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-11-02 17:40:49 +01:00
Ganesh Ajjanagadde
c03044c86a avutil/eval: minor typo
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-11-01 19:35:01 -05:00
Matt Oliver
bff009697d avutil/x86/intmath: Add missing header.
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2015-11-01 02:11:29 +11:00
Ganesh Ajjanagadde
8a5b60a6b1 avutil/opencl_internal: add av_warn_unused_result
clSetKernelArg can return an error due to lack of memory (for instance):
https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clSetKernelArg.html.
Thus this error must be propagated.

Currently should not trigger warnings, but adds robustness.
Untested.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-31 10:40:54 -04:00
Matt Oliver
6c6ac9cb17 avutil/x86/intmath: Use tzcnt in place of bsf.
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2015-10-31 23:11:32 +11:00
Ganesh Ajjanagadde
8d9f86bd37 avutil/rational: use frexp rather than ad-hoc log to get floating point exponent
This simplifies and cleans up the code.
Furthermore, it is much faster due to absence of the slow log computation.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-30 23:18:43 -04:00
Ganesh Ajjanagadde
191f611906 avutil/wchar_filename: add av_warn_unused_result
Current code is fine, this just adds robustness.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-30 13:47:28 -04:00
Ganesh Ajjanagadde
20a30077c3 avutil/mathematics: correct documentation for av_gcd
av_gcd is now always defined regardless of input. This documents this
change in the "documented API". Two benefits (closely related):
1. The function is robust, and there is no need to worry about INT64_MIN, etc.

2. Clients of av_gcd, like av_reduce, can now be made fully correct. Currently,
av_reduce can trigger undefined behavior if e.g num is INT64_MIN due to
integer overflow in the FFABS. Furthermore, this undefined behavior is
completely undocumented, and could be a fuzzer's paradise. The FFABS was needed in the past as
av_gcd was undefined for negative inputs. In order to make av_reduce
robust, it is essential to guarantee that av_gcd works for all int64_t.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-30 13:42:04 -04:00
Ganesh Ajjanagadde
b7fb7c4542 avutil/mathematics: make av_gcd more robust
This ensures that no undefined behavior is invoked, while retaining
identical return values in all cases and at no loss of performance
(identical asm on clang and gcc).
Essentially, this patch exchanges undefined behavior with implementation
defined behavior, a strict improvement.

Rationale:
1. The ideal solution is to have the return type a uint64_t. This
unfortunately requires an API change.
2. The only pathological behavior happens if both arguments are
INT64_MIN, to the best of my knowledge. In such a case, the
implementation defined behavior is invoked in the sense that UINT64_MAX
is interpreted as INT64_MIN, which any reasonable implementation will
do. In any case, any usage where both arguments are INT64_MIN is a
fuzzer anyway.
3. Alternatives of checking, etc require branching and lose performance
for no concrete gain - no client cares about av_gcd's actual value when
both args are INT64_MIN. Even if it did, on sane platforms (e.g all the
ones FFmpeg cares about), it produces a correct gcd, namely INT64_MIN.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-29 19:13:55 -04:00
Ganesh Ajjanagadde
dd36749557 avutil/audio_fifo: add av_warn_unused_result
This one should not trigger any warnings, but will be useful for future
robustness.

Strictly speaking, one could check the size after the call by examining
the structure instead of the return value. Such a use case is highly
unusual, and this commit may be easily reverted if there is a legitimate
need of such use.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-28 23:05:31 -04:00
Ganesh Ajjanagadde
fab1562a50 avutil/ripemd: make rol macro more robust by adding parentheses
This ensures that the macro remains correct in the sense of allowing
expressions for value and bits, by placing the value and bits expressions within
parentheses.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-10-28 21:42:15 -04:00