boost/libs/multiprecision/doc/tutorial_mixed_precision.qbk

[/
  Copyright 2011 - 2020 John Maddock.
  Copyright 2013 - 2019 Paul A. Bristow.
  Copyright 2013 Christopher Kormanyos.

  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]

[section:mixed Mixed Precision Arithmetic]

Mixed precision arithmetic is fully supported by the library.

There are three different forms:

* Where the operands are of different precision or types.
* Where the operands are of the same type and precision, but yield a higher precision result.
* Where the operands or result are different but equivalent types (for example types which differ only in their memory management).

[h4 Mixing Operands of Differing Types or Precision]

If the arguments to a binary operator are of different types or precision, then the operation is allowed
as long as there is an unambiguous implicit conversion from one argument type to the other.
In all cases the arithmetic is performed "as if" the lower precision type is promoted to the
higher precision type before applying the operator.  However, particular backends may optimise
this and avoid actually creating a temporary if they are able to do so.

For example:

   mpfr_float_50         a(2), b;
   mpfr_float_100        c(3), d;
   static_mpfr_float_50  e(5), f;
   mpz_int               i(20);

   d = a * c;  // OK, result of operand is an mpfr_float_100.
   b = a * c;  // Error, can't convert the result to an mpfr_float_50 as it will lose digits.
   f = e * i;  // OK, unambiguous conversion from mpz_int to static_mpfr_float_50

[h4 Operands of the Same Precision]

Sometimes you want to apply an operator to two arguments of the same precision in
such a way as to obtain a result of higher precision.  The most common situation
occurs with fixed precision integers, where you want to multiply two N-bit numbers
to obtain a 2N-bit result.  This is supported in this library by the following
free functions:

   template <class ResultType, class Source1 class Source2>
   ResultType& add(ResultType& result, const Source1& a, const Source2& b);

   template <class ResultType, class Source1 class Source2>
   ResultType& subtract(ResultType& result, const Source1& a, const Source2& b);

   template <class ResultType, class Source1 class Source2>
   ResultType& multiply(ResultType& result, const Source1& a, const Source2& b);

These functions apply the named operator to the arguments ['a] and ['b] and store the
result in ['result], returning ['result].  In all cases they behave "as if"
arguments ['a] and ['b] were first promoted to type `ResultType` before applying the
operator, though particular backends may well avoid that step by way of an optimization.

The type `ResultType` must be an instance of class `number`, and the types `Source1` and `Source2`
may be either instances of class `number` or native integer types.  The latter is an optimization
that allows arithmetic to be performed on native integer types producing an extended precision result.

For example:

[mixed_eg]

Produces the output:

[mixed_output]

[h4 Mixing different, but "equivalent" types]

Ordinarily, mixing types of the same precision will produce a compiler error since there is
no unambiguous result type.  However, there is a traits class:

   namespace boost{ namespace multiprecision

   template <class NumberType1, class NumberType2>
   struct is_equivalent_number_type;

   }
   }

When it's `value` const-member value is `true` then the library will treat the types `NumberType1` and `NumberType2` as if
they are interchangeable.  This is typically used to optimise memory management by using two types with differing
memory allocation strategies for different roles.  Typically, we would be using a type with dymanic memory allocation and a minimal
memory footprint for the main storage type (think large arrays or matrices), but a type with internal storage and no dynamic
allocation (but a larger memory footprint) for a few select calculations.

There are three backends that define this trait by default:

* __cpp_int's, provided the two types differ only in their internal cache size.
* __cpp_bin_float's provided they are of the same precision.
* __mpfr_float_backend's provided they are of the same precision.

In addition, while this feature can be used with expression templates turned off, this feature minimises temporaries
and hence memory allocations when expression template are turned on.

By way of an example, consider the dot product of two vectors of __cpp_int's, our first, fairly trivial
implementation might look like this:

[dot_prod_1]

However, in order to reduce the need for memory allocations when constructing the temporaries needed
for the multiply-and-add operations, we could use an equivalent type with a larger internal cache like this:

[dot_prod_2]

Before we compare performance though, there is one other obvious thing we could try.  By simply declaring
a variable for the result of the intermediate multiplications, and reusing that variable each time through
the loop, we might also expect to greatly reduce the number of allocations required.

[dot_prod_3]

We'll begin by comparing how many actual allocations were required to calculate the dot product of 1000
value vectors for random data with various bit counts:

[table
[[Bit Count][Allocations Count Version 1][Allocations Count Version 2][Allocations Count Version 3]]
[[32][1[footnote Here everything fits within __cpp_int's default internal cache, so no allocation are required.]][0][0]]
[[64][1001][1[footnote A single allocation for the return value.]][1]]
[[128][1002][1][2]]
[[256][1002][1][3[footnote Here the input data is such that more than one allocation is required for the temporary.]]]
[[512][1002][1][3]]
[[1024][1002][1001[footnote At this point we exceed the internal cache of our internal calculation type.]][3]]
]

Timings for the three methods are as follows (MSVC-16.8.0, x64):

[table
[[Bit Count][time/ms Version 1][time/ms Version 2][time/ms Version 3]]
[[32][0.021][0.021][0.021]]
[[64][0.032][0.032][0.029]]
[[128][0.099][0.041][0.041]]
[[256][0.154][0.091][0.094]]
[[512][0.323][0.270][0.269]]
[[1024][0.998][0.995][0.949]]
]

As you can see, there is a sweet spot for middling-sized integers where we gain: if the values are small, then
__cpp_int's own internal cache is large enough anyway, and no allocation occur.  Conversely, if the values are
sufficiently large, then the cost of the actual arithmetic dwarfs the memory allocation time.  In this particular
case, carefully writing the code (version 3) is clearly at least as good as using a separate type with a larger cache.
However, there may be times when it's not practical to re-write existing code, purely to optimise it for the
multiprecision use case.

A typical example where we can't rewrite our code to avoid unnecessary allocations, occurs when we're calling an
external routine.  For example the arc length of an ellipse with radii ['a] and ['b] is given by:

[pre L(a, b) = 4aE(k)]

with:

[pre k = [sqrt](1 - b[super 2]/a[super 2])]

where ['E(k)] is the complete elliptic integral of the second kind, which is available as a template function `ellint_2` in Boost.Math.

Naively, we might implement this for use with __mpfr_float_backend like this:

[elliptic_arc1]

But we might also try mixing our arithmetic types - regular dynamically allocated __mpfr_float_backend's for the
interface to minimise memory footprint in our external storage, and statically allocated __mpfr_float_backend's
for the internal arithmetic:

[elliptic_arc2]

The performance comparisons are surprisingly stark:

[table
[[N][`number<mpfr_float_backend<N>>` / ms][`number<mpfr_float_backend<N, allocate_stack>>` / ms]]
[[30][19.5][3.1]]
[[40][12.5][6.2]]
[[50][14.4][6.6]]
[[60][18.0][9.5]]
[[70][18.0][9.6]]
[[80][20.0][12.8]]
]

As before, the results are for MSVC-16.8.0/x64, and in point of fact, the results do not always favour
non-allocating types so much, it does depend very much on the special function being called and/or the arguments used.

[h4 Backends With Optimized Mixed Precision Arithmetic]

The following backends have at least some direct support for mixed-precision arithmetic,
and therefore avoid creating unnecessary temporaries when using the interfaces above.
Therefore when using these types it's more efficient to use mixed-precision arithmetic,
than it is to explicitly cast the operands to the result type:

__mpfr_float_backend, __mpf_float, __cpp_int.

[endsect] [/section:mixed Mixed Precision Arithmetic]