756 lines
28 KiB
Plaintext
756 lines
28 KiB
Plaintext
|
[/
|
||
|
Copyright 2011 - 2020 John Maddock.
|
||
|
Copyright 2013 - 2019 Paul A. Bristow.
|
||
|
Copyright 2013 Christopher Kormanyos.
|
||
|
|
||
|
Distributed under the Boost Software License, Version 1.0.
|
||
|
(See accompanying file LICENSE_1_0.txt or copy at
|
||
|
http://www.boost.org/LICENSE_1_0.txt).
|
||
|
]
|
||
|
|
||
|
[section:limits Numeric Limits]
|
||
|
|
||
|
Boost.Multiprecision tries hard to implement `std::numeric_limits` for all types
|
||
|
as far as possible and meaningful because experience with Boost.Math has shown that this aids portability.
|
||
|
|
||
|
The [@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf C++ standard library]
|
||
|
defines `std::numeric_limits` in section 18.3.2.
|
||
|
|
||
|
This in turn refers to the C standard
|
||
|
[@http://www.open-std.org/jtc1/sc22/wg11/docs/n507.pdf SC22/WG11 N507 DRAFT INTERNATIONAL ISO/IEC STANDARD
|
||
|
WD 10967-1]
|
||
|
Information technology Language independent arithmetic Part 1: Integer and floating-point arithmetic.
|
||
|
|
||
|
That C Standard in turn refers to
|
||
|
|
||
|
[@https://doi.org/10.1109/IEEESTD.1985.82928 IEEE754 IEEE Standard for Binary
|
||
|
Floating-Point Arithmetic]
|
||
|
|
||
|
There is a useful summary of `std::numeric_limits` at
|
||
|
[@http://www.cplusplus.com/reference/limits/numeric_limits/ C++ reference].
|
||
|
|
||
|
The chosen backend often determines how completely `std::numeric_limits` is available.
|
||
|
|
||
|
Compiler options, processor type, and definition of macros or assembler instructions to control denormal numbers will alter
|
||
|
the values in the tables given below.
|
||
|
|
||
|
[warning GMP's extendable floatin-point `mpf_t` does not have a concept of overflow:
|
||
|
operations that lead to overflow eventually run of out of resources
|
||
|
and terminate with stack overflow (often after several seconds).]
|
||
|
|
||
|
[section:constants std::numeric_limits<> constants]
|
||
|
|
||
|
[h4 is_specialized]
|
||
|
|
||
|
`true` for all arithmetic types (integer, floating and fixed-point)
|
||
|
for which `std::numeric_limits<T>::numeric_limits` is specialized.
|
||
|
|
||
|
A typical test is
|
||
|
|
||
|
if (std::numeric_limits<T>::is_specialized == false)
|
||
|
{
|
||
|
std::cout << "type " << typeid(T).name() << " is not specialized for std::numeric_limits!" << std::endl;
|
||
|
// ...
|
||
|
}
|
||
|
|
||
|
Typically `numeric_limits<T>::is_specialized` is `true` for all `T` where the compile-time constant
|
||
|
members of `numeric_limits` are indeed known at compile time, and don't vary at runtime. For example
|
||
|
floating-point types with runtime-variable precision such as `mpfr_float` have no `numeric_limits`
|
||
|
specialization as it would be impossible to define all the members at compile time. In contrast
|
||
|
the precision of a type such as `mpfr_float_50` is known at compile time, and so it ['does] have a
|
||
|
`numeric_limits` specialization.
|
||
|
|
||
|
Note that not all the `std::numeric_limits` member constants and functions are meaningful for all user-defined types (UDT),
|
||
|
such as the decimal and binary multiprecision types provided here. More information on this is given in the sections below.
|
||
|
|
||
|
[h4 infinity]
|
||
|
|
||
|
For floating-point types, [infin] is defined wherever possible,
|
||
|
but clearly infinity is meaningless for __arbitrary_precision arithmetic backends,
|
||
|
and there is one floating-point type (GMP's `mpf_t`, see __mpf_float) which has no notion
|
||
|
of infinity or NaN at all.
|
||
|
|
||
|
A typical test whether infinity is implemented is
|
||
|
|
||
|
if(std::numeric_limits<T>::has_infinity)
|
||
|
{
|
||
|
std::cout << std::numeric_limits<T>::infinity() << std::endl;
|
||
|
}
|
||
|
|
||
|
and using tests like this is strongly recommended to improve portability.
|
||
|
|
||
|
[warning If the backend is switched to a type that does not support infinity (or similarly NaNs) then,
|
||
|
without checks like this, there will be trouble.]
|
||
|
|
||
|
[h4 is_signed]
|
||
|
|
||
|
`std::numeric_limits<T>::is_signed == true` if the type `T` is signed.
|
||
|
|
||
|
For __fundamental binary types, the sign is held in a single bit,
|
||
|
but for other types (`cpp_dec_float` and `cpp_bin_float`)
|
||
|
it may be a separate storage element, usually `bool`.
|
||
|
|
||
|
[h4 is_exact]
|
||
|
|
||
|
`std::numeric_limits<T>::is_exact == true` if type T uses exact representations.
|
||
|
|
||
|
This is defined as `true` for all integer types and `false` for floating-point types.
|
||
|
|
||
|
[@http://stackoverflow.com/questions/14203654/stdnumeric-limitsis-exact-what-is-a-usable-definition A usable definition]
|
||
|
has been discussed.
|
||
|
|
||
|
ISO/IEC 10967-1, Language independent arithmetic, noted by the C++ Standard defines
|
||
|
|
||
|
A floating-point type F shall be a finite subset of [real].
|
||
|
|
||
|
The important practical distinction is that all integers (up to `max()`) can be stored exactly.
|
||
|
|
||
|
[@http://en.wikipedia.org/wiki/Rational_number Rational]
|
||
|
types using two integer types are also exact.
|
||
|
|
||
|
Floating-point types [*cannot store all real values]
|
||
|
(those in the set of [real]) [*exactly].
|
||
|
For example, 0.5 can be stored exactly in a binary floating-point, but 0.1 cannot.
|
||
|
What is stored is the nearest representable real value, that is, rounded to nearest.
|
||
|
|
||
|
Fixed-point types (usually decimal) are also defined as exact, in that they only
|
||
|
store a [*fixed precision], so half cents or pennies (or less) cannot be stored.
|
||
|
The results of computations are rounded up or down,
|
||
|
just like the result of integer division stored as an integer result.
|
||
|
|
||
|
There are number of proposals to
|
||
|
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3407.html
|
||
|
add Decimal floating-point Support to C++].
|
||
|
|
||
|
[@http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2009/n2849.pdf Decimal TR].
|
||
|
|
||
|
And also
|
||
|
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3352.html
|
||
|
C++ Binary Fixed-Point Arithmetic].
|
||
|
|
||
|
[h4 is_bounded]
|
||
|
|
||
|
`std::numeric_limits<T>::is_bounded == true` if the set of values represented by the type `T` is finite.
|
||
|
|
||
|
This is `true` for all __fundamental_type integer, fixed and floating-point types,
|
||
|
and most multi-precision types.
|
||
|
|
||
|
It is only `false` for a few __arbitrary_precision types like `cpp_int`.
|
||
|
|
||
|
Rational and fixed-exponent representations are exact but not integer.
|
||
|
|
||
|
[h4 is_modulo]
|
||
|
|
||
|
`std::numeric_limits<T>::is_modulo` is defined as `true` if adding two positive values of type T
|
||
|
can yield a result less than either value.
|
||
|
|
||
|
`is_modulo == true` means that the type does not overflow, but, for example,
|
||
|
'wraps around' to zero, when adding one to the `max()` value.
|
||
|
|
||
|
For most __fundamental integer types, `std::numeric_limits<>::is_modulo` is `true`.
|
||
|
|
||
|
`bool` is the only exception.
|
||
|
|
||
|
The modulo behaviour is sometimes useful,
|
||
|
but also can be unexpected, and sometimes undesired, behaviour.
|
||
|
|
||
|
Overflow of signed integers can be especially unexpected,
|
||
|
possibly causing change of sign.
|
||
|
|
||
|
Boost.Multiprecision integer type `cpp_int` is not modulo
|
||
|
because as an __arbitrary_precision types,
|
||
|
it expands to hold any value that the machine resources permit.
|
||
|
|
||
|
However fixed precision __cpp_int's may be modulo if they are unchecked
|
||
|
(i.e. they behave just like __fundamental integers), but not if they are checked
|
||
|
(overflow causes an exception to be raised).
|
||
|
|
||
|
__fundamental and multi-precision floating-point types are normally not modulo.
|
||
|
|
||
|
Where possible, overflow is to `std::numeric_limits<>::infinity()`,
|
||
|
provided `std::numeric_limits<>::has_infinity == true`.
|
||
|
|
||
|
[h4 radix]
|
||
|
|
||
|
Constant `std::numeric_limits<T>::radix` returns either 2 (for __fundamental and binary types)
|
||
|
or 10 (for decimal types).
|
||
|
|
||
|
[h4 digits]
|
||
|
|
||
|
The number of `radix` digits that be represented without change:
|
||
|
|
||
|
* for integer types, the number of [*non-sign bits] in the significand.
|
||
|
* for floating types, the number of [*radix digits] in the significand.
|
||
|
|
||
|
The values include any implicit bit, so for example, for the ubiquious
|
||
|
`double` using 64 bits
|
||
|
([@http://en.wikipedia.org/wiki/Double_precision_floating-point_format IEEE binary64 ]),
|
||
|
`digits` == 53, even though there are only 52 actual bits of the significand stored in the representation.
|
||
|
The value of `digits` reflects the fact that there is one implicit bit which is always set to 1.
|
||
|
|
||
|
The Boost.Multiprecision binary types do not use an implicit bit, so the
|
||
|
`digits` member reflects exactly how many bits of precision were requested:
|
||
|
|
||
|
typedef number<cpp_bin_float<53, digit_base_2> > float64;
|
||
|
typedef number<cpp_bin_float<113, digit_base_2> > float128;
|
||
|
std::numeric_limits<float64>::digits == 53.
|
||
|
std::numeric_limits<float128>::digits == 113.
|
||
|
|
||
|
For the most common case of `radix == 2`,
|
||
|
`std::numeric_limits<T>::digits` is the number of bits in the representation,
|
||
|
not counting any sign bit.
|
||
|
|
||
|
For a decimal integer type, when `radix == 10`, it is the number of decimal digits.
|
||
|
|
||
|
[h4 digits10]
|
||
|
|
||
|
Constant `std::numeric_limits<T>::digits10` returns the number of
|
||
|
decimal digits that can be represented without change or loss.
|
||
|
|
||
|
For example, `numeric_limits<unsigned char>::digits10` is 2.
|
||
|
|
||
|
This somewhat inscrutable definition means that an `unsigned char`
|
||
|
can hold decimal values `0..99`
|
||
|
without loss of precision or accuracy, usually from truncation.
|
||
|
|
||
|
Had the definition been 3 then that would imply it could hold 0..999,
|
||
|
but as we all know, an 8-bit `unsigned char` can only hold 0..255,
|
||
|
and an attempt to store 256 or more will involve loss or change.
|
||
|
|
||
|
For bounded integers, it is thus [*one less] than number of decimal digits
|
||
|
you need to display the biggest integer `std::numeric_limits<T>::max()`.
|
||
|
This value can be used to predict the layout width required for
|
||
|
|
||
|
[digits10_1]
|
||
|
|
||
|
For example, `unsigned short` is often stored in 16 bits,
|
||
|
so the maximum value is 0xFFFF or 65535.
|
||
|
|
||
|
[digits10_2]
|
||
|
|
||
|
|
||
|
For bounded floating-point types,
|
||
|
if we create a `double` with a value with `digits10` (usually 15) decimal digits,
|
||
|
`1e15` or `1000000000000000` :
|
||
|
|
||
|
[digits10_3]
|
||
|
|
||
|
and we can increment this value to `1000000000000001`
|
||
|
as expected and show the difference too.
|
||
|
|
||
|
But if we try to repeat this with more than `digits10` digits,
|
||
|
|
||
|
[digits10_4]
|
||
|
|
||
|
then we find that when we add one it has no effect,
|
||
|
and display show that there is loss of precision. See
|
||
|
[@http://en.wikipedia.org/wiki/Loss_of_significance Loss of significance or cancellation error].
|
||
|
|
||
|
So `digits10` is the number of decimal digits [*guaranteed] to be correct.
|
||
|
|
||
|
For example, 'round-tripping' for `double`:
|
||
|
|
||
|
* If a decimal string with at most `digits10`( == 15) significant decimal digits
|
||
|
is converted to `double` and then converted back to the
|
||
|
same number of significant decimal digits,
|
||
|
then the final string will match the original 15 decimal digit string.
|
||
|
* If a `double` floating-point number is converted to a decimal string
|
||
|
with at least 17 decimal digits
|
||
|
and then converted back to `double`,
|
||
|
then the result will be binary identical to the original `double` value.
|
||
|
|
||
|
For most purposes, you will much more likely want
|
||
|
`std::numeric_limits<>::max_digits10`,
|
||
|
the number of decimal digits that ensure that a change of one least significant bit (__ULP)
|
||
|
produces a different decimal digits string.
|
||
|
|
||
|
For the most common `double` floating-point type,`max_digits10` is `digits10+2`,
|
||
|
but you should use C++11 `max_digits10`
|
||
|
where possible (see [link boost_multiprecision.tut.limits.constants.max_digits10 below]).
|
||
|
|
||
|
[h4:max_digits10 max_digits10]
|
||
|
|
||
|
`std::numeric_limits<T>::max_digits10` was added for floating-point
|
||
|
because `digits10` decimal digits are insufficient to show
|
||
|
a least significant bit (ULP) change giving puzzling displays like
|
||
|
|
||
|
0.666666666666667 != 0.666666666666667
|
||
|
|
||
|
from failure to 'round-trip', for example:
|
||
|
|
||
|
[max_digits10_2]
|
||
|
|
||
|
If you wish to ensure that a change of one least significant bit (ULP)
|
||
|
produces a different decimal digits string,
|
||
|
then `max_digits10` is the precision to use.
|
||
|
|
||
|
For example:
|
||
|
|
||
|
[max_digits10_3]
|
||
|
|
||
|
will display [pi] to the maximum possible precision using a `double`.
|
||
|
|
||
|
[max_digits10_4]
|
||
|
|
||
|
For integer types, `max_digits10` is implementation-dependent,
|
||
|
but is usually `digits10 + 2`.
|
||
|
This is the output field-width required for the maximum value of the type T
|
||
|
`std::numeric_limits<T>::max()` ['including a sign and a space].
|
||
|
|
||
|
So this will produce neat columns.
|
||
|
|
||
|
std::cout << std::setw(std::numeric_limits<int>::max_digits10) ...
|
||
|
|
||
|
The extra two or three least-significant digits are 'noisy' and may be junk,
|
||
|
but if you want to 'round-trip' - printing a value out as a decimal digit string and reading it back in -
|
||
|
(most commonly during serialization and de-serialization)
|
||
|
you must use `os.precision(std::numeric_limits<T>::max_digits10)`.
|
||
|
|
||
|
[note For Microsoft Visual Studio 2010,
|
||
|
`std::numeric_limits<float>::max_digits10` is wrongly defined as 8. It should be 9.
|
||
|
]
|
||
|
|
||
|
[note For Microsoft Visual Studio before 2013 and the default floating-point
|
||
|
format, a small range of double-precision floating-point values with a
|
||
|
significand of approximately 0.0001 to 0.004 and exponent values of 1010 to
|
||
|
1014 do not round-trip exactly being off by one least significant bit,
|
||
|
for probably every third value of the significand.
|
||
|
|
||
|
A workaround is using the scientific or exponential format `std::scientific`.
|
||
|
|
||
|
Other older compilers also fail to implement round-tripping entirely fault-free, for example, see
|
||
|
[@https://www.exploringbinary.com/incorrectly-rounded-conversions-in-gcc-and-glibc/ Incorrectly Rounded Conversions in GCC and GLIBC].
|
||
|
|
||
|
For more details see
|
||
|
[@https://www.exploringbinary.com/incorrect-round-trip-conversions-in-visual-c-plus-plus/ Incorrect Round-Trip Conversions in Visual C++],
|
||
|
and references therein
|
||
|
and
|
||
|
[@https://arxiv.org/pdf/1310.8121.pdf Easy Accurate Reading and Writing of Floating-Point Numbers, Aubrey Jaffer (August 2018)].
|
||
|
|
||
|
Microsoft VS2017 and other recent compilers, now use the
|
||
|
[@https://doi.org/10.1145/3192366.3192369 Ryu fast float-to-string conversion by Ulf Adams]
|
||
|
algorithm, claimed to be both exact and fast for 32 and 64-bit floating-point numbers.
|
||
|
] [/note]
|
||
|
|
||
|
[h4 round_style]
|
||
|
|
||
|
The rounding style determines how the result of floating-point operations
|
||
|
is treated when the result cannot be [*exactly represented] in the significand.
|
||
|
Various rounding modes may be provided:
|
||
|
|
||
|
* round to nearest up or down (default for floating-point types).
|
||
|
* round up (toward positive infinity).
|
||
|
* round down (toward negative infinity).
|
||
|
* round toward zero (integer types).
|
||
|
* no rounding (if decimal radix).
|
||
|
* rounding mode is not determinable.
|
||
|
|
||
|
For integer types, `std::numeric_limits<T>::round_style` is always towards zero, so
|
||
|
|
||
|
std::numeric_limits<T>::round_style == std::round_to_zero;
|
||
|
|
||
|
A decimal type, `cpp_dec_float` rounds in no particular direction,
|
||
|
which is to say it doesn't round at all.
|
||
|
And since there are several guard digits,
|
||
|
it's not really the same as truncation (round toward zero) either.
|
||
|
|
||
|
For floating-point types, it is normal to round to nearest.
|
||
|
|
||
|
std::numeric_limits<T>::round_style == std::round_to_nearest;
|
||
|
|
||
|
See function `std::numeric_limits<T>::round_error` for the maximum error (in ULP)
|
||
|
that rounding can cause.
|
||
|
|
||
|
[h4 has_denorm_loss]
|
||
|
|
||
|
`true` if a loss of precision is detected as a
|
||
|
[@http://en.wikipedia.org/wiki/Denormalization denormalization] loss,
|
||
|
rather than an inexact result.
|
||
|
|
||
|
Always `false` for integer types.
|
||
|
|
||
|
`false` for all types which do not have `has_denorm` == `std::denorm_present`.
|
||
|
|
||
|
[h4 denorm_style]
|
||
|
|
||
|
[@http://en.wikipedia.org/wiki/Denormal_number Denormalized values] are
|
||
|
representations with a variable number of exponent bits that can permit
|
||
|
gradual underflow, so that, if type T is `double`.
|
||
|
|
||
|
std::numeric_limits<T>::denorm_min() < std::numeric_limits<T>::min()
|
||
|
|
||
|
A type may have any of the following `enum float_denorm_style` values:
|
||
|
|
||
|
* `std::denorm_absent`, if it does not allow denormalized values.
|
||
|
(Always used for all integer and exact types).
|
||
|
* `std::denorm_present`, if the floating-point type allows denormalized values.
|
||
|
*`std::denorm_indeterminate`, if indeterminate at compile time.
|
||
|
|
||
|
[h4 Tinyness before rounding]
|
||
|
|
||
|
`bool std::numeric_limits<T>::tinyness_before`
|
||
|
|
||
|
`true` if a type can determine that a value is too small
|
||
|
to be represent as a normalized value before rounding it.
|
||
|
|
||
|
Generally true for `is_iec559` floating-point __fundamantal types,
|
||
|
but false for integer types.
|
||
|
|
||
|
Standard-compliant IEEE 754 floating-point implementations may detect the floating-point underflow at three predefined moments:
|
||
|
|
||
|
# After computation of a result with absolute value smaller than
|
||
|
`std::numeric_limits<T>::min()`,
|
||
|
such implementation detects ['tinyness before rounding] (e.g. UltraSparc).
|
||
|
|
||
|
# After rounding of the result to `std::numeric_limits<T>::digits` bits,
|
||
|
if the result is tiny, such implementation detects ['tinyness after rounding]
|
||
|
(e.g. SuperSparc).
|
||
|
|
||
|
# If the conversion of the rounded tiny result to subnormal form
|
||
|
resulted in the loss of precision, such implementation detects ['denorm loss].
|
||
|
|
||
|
[endsect] [/section:constants std::numeric_limits<> Constants]
|
||
|
|
||
|
[section:functions `std::numeric_limits<>` functions]
|
||
|
|
||
|
[h4:max_function `max` function]
|
||
|
|
||
|
Function `(std::numeric_limits<T>::max)()` returns the largest finite value
|
||
|
that can be represented by the type T. If there is no such value (and
|
||
|
`numeric_limits<T>::bounded` is `false`) then returns `T()`.
|
||
|
|
||
|
For __fundamental types there is usually a corresponding MACRO value TYPE_MAX,
|
||
|
where TYPE is CHAR, INT, FLOAT etc.
|
||
|
|
||
|
Other types, including those provided by a typedef,
|
||
|
for example `INT64_T_MAX` for `int64_t`, may provide a macro definition.
|
||
|
|
||
|
To cater for situations where no `numeric_limits` specialization is available
|
||
|
(for example because the precision of the type varies at runtime),
|
||
|
packaged versions of this (and other functions) are provided using
|
||
|
|
||
|
#include <boost/math/tools/precision.hpp>
|
||
|
|
||
|
T = boost::math::tools::max_value<T>();
|
||
|
|
||
|
Of course, these simply use `(std::numeric_limits<T>::max)()` if available,
|
||
|
but otherwise 'do something sensible'.
|
||
|
|
||
|
[h4 lowest function]
|
||
|
|
||
|
Since C++11: `std::numeric_limits<T>::lowest()` is
|
||
|
|
||
|
* For integral types, the same as function `min()`.
|
||
|
* For floating-point types, generally the negative of `max()`
|
||
|
(but implementation-dependent).
|
||
|
|
||
|
[digits10_5]
|
||
|
|
||
|
[h4:min_function `min` function]
|
||
|
|
||
|
Function `(std::numeric_limits<T>::min)()` returns the minimum finite value
|
||
|
that can be represented by the type T.
|
||
|
|
||
|
For __fundamental types, there is usually a corresponding MACRO value TYPE_MIN,
|
||
|
where TYPE is CHAR, INT, FLOAT etc.
|
||
|
|
||
|
Other types, including those provided by a `typedef`,
|
||
|
for example, `INT64_T_MIN` for `int64_t`, may provide a macro definition.
|
||
|
|
||
|
For floating-point types,
|
||
|
it is more fully defined as the ['minimum positive normalized value].
|
||
|
|
||
|
See `std::numeric_limits<T>::denorm_min()` for the smallest denormalized value, provided
|
||
|
|
||
|
std::numeric_limits<T>::has_denorm == std::denorm_present
|
||
|
|
||
|
To cater for situations where no `numeric_limits` specialization is available
|
||
|
(for example because the precision of the type varies at runtime),
|
||
|
packaged versions of this (and other functions) are provided using
|
||
|
|
||
|
#include <boost/math/tools/precision.hpp>
|
||
|
|
||
|
T = boost::math::tools::min_value<T>();
|
||
|
|
||
|
Of course, these simply use `std::numeric_limits<T>::min()` if available.
|
||
|
|
||
|
[h4 denorm_min function]
|
||
|
|
||
|
Function `std::numeric_limits<T>::denorm_min()`
|
||
|
returns the smallest
|
||
|
[@http://en.wikipedia.org/wiki/Denormal_number denormalized value],
|
||
|
provided
|
||
|
|
||
|
std::numeric_limits<T>::has_denorm == std::denorm_present
|
||
|
|
||
|
[denorm_min_1]
|
||
|
|
||
|
The exponent is effectively reduced from -308 to -324
|
||
|
(though it remains encoded as zero and leading zeros appear in the significand,
|
||
|
thereby losing precision until the significand reaches zero).
|
||
|
|
||
|
[h4 round_error]
|
||
|
|
||
|
Function `std::numeric_limits<T>::round_error()` returns the maximum error
|
||
|
(in units of __ULP)
|
||
|
that can be caused by any basic arithmetic operation.
|
||
|
|
||
|
round_style == std::round_indeterminate;
|
||
|
|
||
|
The rounding style is indeterminable at compile time.
|
||
|
|
||
|
For floating-point types, when rounding is to nearest,
|
||
|
only half a bit is lost by rounding, and `round_error == 0.5`.
|
||
|
In contrast when rounding is towards zero, or plus/minus infinity,
|
||
|
we can loose up to one bit from rounding, and `round_error == 1`.
|
||
|
|
||
|
For integer types, rounding always to zero, so at worst almost one bit can be rounded,
|
||
|
so `round_error == 1`.
|
||
|
|
||
|
`round_error()` can be used with `std::numeric_limits<T>::epsilon()` to estimate
|
||
|
the maximum potential error caused by rounding. For typical floating-point types,
|
||
|
`round_error() = 1/2`, so half epsilon is the maximum potential error.
|
||
|
|
||
|
[round_error_1]
|
||
|
|
||
|
There are, of course, many occasions when much bigger loss of precision occurs,
|
||
|
for example, caused by
|
||
|
[@http://en.wikipedia.org/wiki/Loss_of_significance Loss of significance or cancellation error]
|
||
|
or very many iterations.
|
||
|
|
||
|
[h4:epsilon epsilon]
|
||
|
|
||
|
Function `std::numeric_limits<T>::epsilon()` is meaningful only for non-integral types.
|
||
|
|
||
|
It returns the difference between `1.0` and the next value representable
|
||
|
by the floating-point type T.
|
||
|
So it is a one least-significant-bit change in this floating-point value.
|
||
|
|
||
|
For `double` (`float_64t`) it is `2.2204460492503131e-016`
|
||
|
showing all possibly significant 17 decimal digits.
|
||
|
|
||
|
[epsilon_1]
|
||
|
|
||
|
We can explicitly increment by one bit using the function `boost::math::float_next()`
|
||
|
and the result is the same as adding `epsilon`.
|
||
|
|
||
|
[epsilon_2]
|
||
|
|
||
|
Adding any smaller value, like half `epsilon`, will have no effect on this value.
|
||
|
|
||
|
[epsilon_3]
|
||
|
|
||
|
So this cancellation error leaves the values equal, despite adding half `epsilon`.
|
||
|
|
||
|
To achieve greater portability over platform and floating-point type,
|
||
|
Boost.Math and Boost.Multiprecision provide a package of functions that
|
||
|
'do something sensible' if the standard `numeric_limits` is not available.
|
||
|
To use these `#include <boost/math/tools/precision.hpp>`.
|
||
|
|
||
|
[epsilon_4]
|
||
|
|
||
|
[h5:FP_tolerance Tolerance for Floating-point Comparisons]
|
||
|
|
||
|
[@https://en.wikipedia.org/wiki/Machine_epsilon Machine epsilon [epsilon]]
|
||
|
is very useful to compute a tolerance when comparing floating-point values,
|
||
|
a much more difficult task than is commonly imagined.
|
||
|
|
||
|
The C++ standard specifies [@https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon `std::numeric_limits<>::epsilon()`]
|
||
|
and Boost.Multiprecision implements this (where possible) for its program-defined types analogous to the
|
||
|
__fundamental floating-point types like `double` `float`.
|
||
|
|
||
|
For more information than you probably want (but still need) see
|
||
|
[@http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html What Every Computer Scientist Should Know About Floating-Point Arithmetic]
|
||
|
|
||
|
The naive test comparing the absolute difference between two values and a tolerance
|
||
|
does not give useful results if the values are too large or too small.
|
||
|
|
||
|
So Boost.Test uses an algorithm first devised by Knuth
|
||
|
for reliably checking if floating-point values are close enough.
|
||
|
|
||
|
See Donald. E. Knuth. The art of computer programming (vol II).
|
||
|
Copyright 1998 Addison-Wesley Longman, Inc., 0-201-89684-2.
|
||
|
Addison-Wesley Professional; 3rd edition. (The relevant equations are in paragraph 4.2.2, Eq. 36 and 37.)
|
||
|
|
||
|
See [@https://www.boost.org/doc/libs/release/libs/test/doc/html/boost_test/testing_tools/extended_comparison/floating_point/floating_points_comparison_theory.html Boost.Math floating_point comparison]
|
||
|
for more details.
|
||
|
|
||
|
See also:
|
||
|
|
||
|
[@http://adtmag.com/articles/2000/03/15/comparing-floats-how-to-determine-if-floating-quantities-are-close-enough-once-a-tolerance-has-been.aspx Alberto Squassia, Comparing floats]
|
||
|
|
||
|
[@http://adtmag.com/articles/2000/03/16/comparing-floats-how-to-determine-if-floating-quantities-are-close-enough-once-a-tolerance-has-been.aspx Alberto Squassia, Comparing floats code]
|
||
|
|
||
|
[@https://www.boost.org/doc/libs/release/libs/test/doc/html/boost_test/testing_tools/extended_comparison/floating_point.html Boost.Test Floating-Point_Comparison]
|
||
|
|
||
|
[tolerance_1]
|
||
|
|
||
|
used thus:
|
||
|
|
||
|
cd ./test
|
||
|
BOOST_CHECK_CLOSE_FRACTION(expected, calculated, tolerance);
|
||
|
|
||
|
(There is also a version BOOST_CHECK_CLOSE using tolerance as a [*percentage] rather than a fraction;
|
||
|
usually the fraction version is simpler to use).
|
||
|
|
||
|
[tolerance_2]
|
||
|
|
||
|
[h4:infinity Infinity - positive and negative]
|
||
|
|
||
|
For floating-point types only, for which
|
||
|
`std::numeric_limits<T>::has_infinity == true`,
|
||
|
function `std::numeric_limits<T>::infinity()`
|
||
|
provides an implementation-defined representation for [infin].
|
||
|
|
||
|
The 'representation' is a particular bit pattern reserved for infinity.
|
||
|
For IEEE754 system (for which `std::numeric_limits<T>::is_iec559 == true`)
|
||
|
[@http://en.wikipedia.org/wiki/IEEE_754-1985#Positive_and_negative_infinity positive and negative infinity]
|
||
|
are assigned bit patterns for all defined floating-point types.
|
||
|
|
||
|
Confusingly, the string resulting from outputting this representation, is also
|
||
|
implementation-defined. And the string that can be input to generate the representation is also implementation-defined.
|
||
|
|
||
|
For example, the output is `1.#INF` on Microsoft systems, but `inf` on most *nix platforms.
|
||
|
|
||
|
This implementation-defined-ness has hampered use of infinity (and NaNs)
|
||
|
but __Boost_Math and __Boost_Multiprecision work hard to provide a sensible representation
|
||
|
for [*all] floating-point types, not just the __fundamental_types,
|
||
|
which with the use of suitable facets to define the input and output strings, makes it possible
|
||
|
to use these useful features portably and including __Boost_Serialization.
|
||
|
|
||
|
[h4 Not-A-Number NaN]
|
||
|
|
||
|
[h5 Quiet_NaN]
|
||
|
|
||
|
For floating-point types only, for which
|
||
|
`std::numeric_limits<T>::has_quiet_NaN == true`,
|
||
|
function `std::numeric_limits<T>::quiet_NaN()`
|
||
|
provides an implementation-defined representation for NaN.
|
||
|
|
||
|
[@http://en.wikipedia.org/wiki/NaN NaNs] are values to indicate that the
|
||
|
result of an assignment or computation is meaningless.
|
||
|
A typical example is `0/0` but there are many others.
|
||
|
|
||
|
NaNs may also be used, to represent missing values: for example,
|
||
|
these could, by convention, be ignored in calculations of statistics like means.
|
||
|
|
||
|
Many of the problems with a representation for
|
||
|
[@http://en.wikipedia.org/wiki/NaN Not-A-Number] has hampered portable use,
|
||
|
similar to those with infinity.
|
||
|
|
||
|
[nan_1]
|
||
|
|
||
|
But using Boost.Math and suitable facets can permit portable use
|
||
|
of both NaNs and positive and negative infinity.
|
||
|
|
||
|
[facet_1]
|
||
|
|
||
|
[h5 Signaling NaN]
|
||
|
|
||
|
For floating-point types only, for which
|
||
|
`std::numeric_limits<T>::has_signaling_NaN == true`,
|
||
|
function `std::numeric_limits<T>::signaling_NaN()`
|
||
|
provides an implementation-defined representation for NaN that causes a hardware trap.
|
||
|
It should be noted however, that at least one implementation of this function causes a hardware
|
||
|
trap to be triggered simply by calling `std::numeric_limits<T>::signaling_NaN()`, and not only
|
||
|
by using the value returned.
|
||
|
|
||
|
[endsect] [/section:functions std::numeric_limits<> functions]
|
||
|
|
||
|
[/ Tables of values for numeric_limits for various __fundamental and cpp_bin_float types]
|
||
|
[include numeric_limits_32_tables.qbk]
|
||
|
[/include numeric_limits_64_tables.qbk]
|
||
|
|
||
|
[section:how_to_tell How to Determine the Kind of a Number From `std::numeric_limits`]
|
||
|
|
||
|
Based on the information above, one can see that different kinds of numbers can be
|
||
|
differentiated based on the information stored in `std::numeric_limits`. This is
|
||
|
in addition to the traits class [link boost_multiprecision.ref.number.traits_class_support
|
||
|
number_category] provided by this library.
|
||
|
|
||
|
[h4 Integer Types]
|
||
|
|
||
|
For an integer type T, all of the following conditions hold:
|
||
|
|
||
|
std::numeric_limits<T>::is_specialized == true
|
||
|
std::numeric_limits<T>::is_integer == true
|
||
|
std::numeric_limits<T>::is_exact == true
|
||
|
std::numeric_limits<T>::min_exponent == 0
|
||
|
std::numeric_limits<T>::max_exponent == 0
|
||
|
std::numeric_limits<T>::min_exponent10 == 0
|
||
|
std::numeric_limits<T>::max_exponent10 == 0
|
||
|
|
||
|
In addition the type is /signed/ if:
|
||
|
|
||
|
std::numeric_limits<T>::is_signed == true
|
||
|
|
||
|
If the type is arbitrary precision then:
|
||
|
|
||
|
std::numeric_limits<T>::is_bounded == false
|
||
|
|
||
|
Otherwise the type is bounded, and returns a non zero value
|
||
|
from:
|
||
|
|
||
|
std::numeric_limits<T>::max()
|
||
|
|
||
|
and has:
|
||
|
|
||
|
std::numeric_limits<T>::is_modulo == true
|
||
|
|
||
|
if the type implements modulo arithmetic on overflow.
|
||
|
|
||
|
[h4 Rational Types]
|
||
|
|
||
|
Rational types are just like integers except that:
|
||
|
|
||
|
std::numeric_limits<T>::is_integer == false
|
||
|
|
||
|
[h4 Fixed Precision Types]
|
||
|
|
||
|
There appears to be no way to tell these apart from rational types, unless they set:
|
||
|
|
||
|
std::numeric_limits<T>::is_exact == false
|
||
|
|
||
|
This is because these types are in essence a rational type with a fixed denominator.
|
||
|
|
||
|
[h4 floating-point Types]
|
||
|
|
||
|
For a floating-point type T, all of the following conditions hold:
|
||
|
|
||
|
std::numeric_limits<T>::is_specialized == true
|
||
|
std::numeric_limits<T>::is_integer == false
|
||
|
std::numeric_limits<T>::is_exact == false
|
||
|
std::numeric_limits<T>::min_exponent != 0
|
||
|
std::numeric_limits<T>::max_exponent != 0
|
||
|
std::numeric_limits<T>::min_exponent10 != 0
|
||
|
std::numeric_limits<T>::max_exponent10 != 0
|
||
|
|
||
|
In addition the type is /signed/ if:
|
||
|
|
||
|
std::numeric_limits<T>::is_signed == true
|
||
|
|
||
|
And the type may be decimal or binary depending on the value of:
|
||
|
|
||
|
std::numeric_limits<T>::radix
|
||
|
|
||
|
In general, there are no arbitrary precision floating-point types, and so:
|
||
|
|
||
|
std::numeric_limits<T>::is_bounded == false
|
||
|
|
||
|
[h4 Exact floating-point Types]
|
||
|
|
||
|
Exact floating-point types are a [@http://en.wikipedia.org/wiki/Field_%28mathematics%29 field]
|
||
|
composed of an arbitrary precision integer scaled by an exponent. Such types
|
||
|
have no division operator and are the same as floating-point types except:
|
||
|
|
||
|
std::numeric_limits<T>::is_exact == true
|
||
|
|
||
|
[h4 Complex Numbers]
|
||
|
|
||
|
For historical reasons, complex numbers do not specialize `std::numeric_limits`, instead you must
|
||
|
inspect `std::numeric_limits<typename T::value_type>`.
|
||
|
|
||
|
[endsect] [/section:how_to_tell How to Determine the Kind of a Number From `std::numeric_limits`]
|
||
|
|
||
|
[endsect] [/section:limits Numeric Limits]
|