boost/libs/multiprecision/doc/reference_cpp_bin_float.qbk

[/
  Copyright 2011 - 2020 John Maddock.
  Copyright 2013 - 2019 Paul A. Bristow.
  Copyright 2013 Christopher Kormanyos.

  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]

[section:cpp_bin_float_ref cpp_bin_float]

   namespace boost{ namespace multiprecision{

   enum digit_base_type
   {
      digit_base_2 = 2,
      digit_base_10 = 10
   };

   template <unsigned Digits, digit_base_type base = digit_base_10, class Allocator = void, class Exponent = int, ExponentMin = 0, ExponentMax = 0>
   class cpp_bin_float;

   typedef number<cpp_bin_float<50> > cpp_bin_float_50;
   typedef number<cpp_bin_float<100> > cpp_bin_float_100;

   typedef number<backends::cpp_bin_float<24, backends::digit_base_2, void, std::int16_t, -126, 127>, et_off>         cpp_bin_float_single;
   typedef number<backends::cpp_bin_float<53, backends::digit_base_2, void, std::int16_t, -1022, 1023>, et_off>       cpp_bin_float_double;
   typedef number<backends::cpp_bin_float<64, backends::digit_base_2, void, std::int16_t, -16382, 16383>, et_off>     cpp_bin_float_double_extended;
   typedef number<backends::cpp_bin_float<113, backends::digit_base_2, void, std::int16_t, -16382, 16383>, et_off>    cpp_bin_float_quad;
   typedef number<backends::cpp_bin_float<237, backends::digit_base_2, void, std::int32_t, -262142, 262143>, et_off>  cpp_bin_float_oct;

   }} // namespaces

Class template `cpp_bin_float` fulfils all of the requirements for a [link boost_multiprecision.ref.backendconc Backend] type.
Its members and non-member functions are deliberately not documented: these are considered implementation details that are subject
to change.

The class takes six template parameters:

[variablelist
[[Digits][The number of digits precision the type
should support.  This is normally expressed as base-10 digits, but that can be changed via the second template parameter.]]
[[base][An enumerated value (either `digit_base_10` or `digit_base_2`) that indicates whether `Digits` is base-10 or base-2]]
[[Allocator][The allocator used: defaults to type `void`, meaning all storage is within the class, and no dynamic
allocation is performed, but can be set to a standard library allocator if dynamic allocation makes more sense.]]
[[Exponent][A signed integer type to use as the type of the exponent - defaults to `int`.]]
[[ExponentMin][The smallest (most negative) permitted exponent, defaults to zero, meaning "define as small as possible
given the limitations of the type and our internal requirements".]]
[[ExponentMax][The largest (most positive) permitted exponent, defaults to zero, meaning "define as large as possible
given the limitations of the type and our internal requirements".]]
]

The type of `number_category<cpp_bin_float<Args...> >::type` is `std::integral_constant<int, number_kind_floating_point>`.

More information on this type can be found in the [link boost_multiprecision.tut.floats.cpp_bin_float tutorial].

[h4 Implementation Notes]

Internally, an N-bit `cpp_bin_float` is represented as an N-bit unsigned integer along with an exponent and a sign.
The integer part is normalized so that its most significant bit is always 1.  The decimal point is assumed to be
directly after the most significant bit of the integer part.  The special values zero, infinity and NaN all have
the integer part set to zero, and the exponent to one of 3 special values above the maximum permitted exponent.

Multiplication is trivial: multiply the two N-bit integer mantissa's to obtain a 2N-bit number, then round and
adjust the sign and exponent.

Addition and subtraction proceed similarly - if the exponents are such that there is overlap between the two
values, then left shift the larger value to produce a number with between N and 2N bits, then perform integer
addition or subtraction, round, and adjust the exponent.

Division proceeds as follows: first scale the numerator by some power of 2 so that integer division will
produce either an N-bit or N+1 bit result plus a remainder.  If we get an N bit result then the size of
twice the remainder compared to the denominator gives us the rounding direction.  Otherwise we have one extra bit
in the result which we can use to determine rounding (in this case ties occur only if the remainder is zero and
the extra bit is a 1).

Square root uses integer square root in a manner analogous to division.

Decimal string to binary conversion proceeds as follows: first parse the digits to
produce an integer multiplied by a decimal exponent.  Note that we stop parsing digits
once we have parsed as many as can possibly effect the result - this stops the integer
part growing too large when there are a very large number of input digits provided.
At this stage if the decimal exponent is positive then the result is an integer and we
can in principle simply multiply by 10^N to get an exact integer result.  In practice
however, that could produce some very large integers.  We also need to be able to divide
by 10^N in the event that the exponent is negative.  Therefore calculation of the 10^N
values plus the multiplication or division are performed using limited precision
integer arithmetic, plus an exponent, and a track of the accumulated error.  At the end of
the calculation we will either be able to round unambiguously, or the error will be such
that we can't tell which way to round.  In the latter case we simply up the precision and try
again until we have an unambiguously rounded result.

Binary to decimal conversion proceeds very similarly to the above, our aim is to calculate
`mantissa * 2^shift * 10^E` where `E` is the decimal exponent and `shift` is calculated
so that the result is an N bit integer assuming we want N digits printed in the result.
As before we use limited precision arithmetic to calculate the result and up the
precision as necessary until the result is unambiguously correctly rounded.  In addition
our initial calculation of the decimal exponent may be out by 1, so we have to correct
that and loop as well in the that case.

[endsect]