196 lines
8.4 KiB
Plaintext
196 lines
8.4 KiB
Plaintext
|
[/
|
||
|
Copyright 2011 - 2020 John Maddock.
|
||
|
Copyright 2013 - 2019 Paul A. Bristow.
|
||
|
Copyright 2013 Christopher Kormanyos.
|
||
|
|
||
|
Distributed under the Boost Software License, Version 1.0.
|
||
|
(See accompanying file LICENSE_1_0.txt or copy at
|
||
|
http://www.boost.org/LICENSE_1_0.txt).
|
||
|
]
|
||
|
|
||
|
[section:mixed Mixed Precision Arithmetic]
|
||
|
|
||
|
Mixed precision arithmetic is fully supported by the library.
|
||
|
|
||
|
There are three different forms:
|
||
|
|
||
|
* Where the operands are of different precision or types.
|
||
|
* Where the operands are of the same type and precision, but yield a higher precision result.
|
||
|
* Where the operands or result are different but equivalent types (for example types which differ only in their memory management).
|
||
|
|
||
|
[h4 Mixing Operands of Differing Types or Precision]
|
||
|
|
||
|
If the arguments to a binary operator are of different types or precision, then the operation is allowed
|
||
|
as long as there is an unambiguous implicit conversion from one argument type to the other.
|
||
|
In all cases the arithmetic is performed "as if" the lower precision type is promoted to the
|
||
|
higher precision type before applying the operator. However, particular backends may optimise
|
||
|
this and avoid actually creating a temporary if they are able to do so.
|
||
|
|
||
|
For example:
|
||
|
|
||
|
mpfr_float_50 a(2), b;
|
||
|
mpfr_float_100 c(3), d;
|
||
|
static_mpfr_float_50 e(5), f;
|
||
|
mpz_int i(20);
|
||
|
|
||
|
d = a * c; // OK, result of operand is an mpfr_float_100.
|
||
|
b = a * c; // Error, can't convert the result to an mpfr_float_50 as it will lose digits.
|
||
|
f = e * i; // OK, unambiguous conversion from mpz_int to static_mpfr_float_50
|
||
|
|
||
|
[h4 Operands of the Same Precision]
|
||
|
|
||
|
Sometimes you want to apply an operator to two arguments of the same precision in
|
||
|
such a way as to obtain a result of higher precision. The most common situation
|
||
|
occurs with fixed precision integers, where you want to multiply two N-bit numbers
|
||
|
to obtain a 2N-bit result. This is supported in this library by the following
|
||
|
free functions:
|
||
|
|
||
|
template <class ResultType, class Source1 class Source2>
|
||
|
ResultType& add(ResultType& result, const Source1& a, const Source2& b);
|
||
|
|
||
|
template <class ResultType, class Source1 class Source2>
|
||
|
ResultType& subtract(ResultType& result, const Source1& a, const Source2& b);
|
||
|
|
||
|
template <class ResultType, class Source1 class Source2>
|
||
|
ResultType& multiply(ResultType& result, const Source1& a, const Source2& b);
|
||
|
|
||
|
These functions apply the named operator to the arguments ['a] and ['b] and store the
|
||
|
result in ['result], returning ['result]. In all cases they behave "as if"
|
||
|
arguments ['a] and ['b] were first promoted to type `ResultType` before applying the
|
||
|
operator, though particular backends may well avoid that step by way of an optimization.
|
||
|
|
||
|
The type `ResultType` must be an instance of class `number`, and the types `Source1` and `Source2`
|
||
|
may be either instances of class `number` or native integer types. The latter is an optimization
|
||
|
that allows arithmetic to be performed on native integer types producing an extended precision result.
|
||
|
|
||
|
For example:
|
||
|
|
||
|
[mixed_eg]
|
||
|
|
||
|
Produces the output:
|
||
|
|
||
|
[mixed_output]
|
||
|
|
||
|
[h4 Mixing different, but "equivalent" types]
|
||
|
|
||
|
Ordinarily, mixing types of the same precision will produce a compiler error since there is
|
||
|
no unambiguous result type. However, there is a traits class:
|
||
|
|
||
|
namespace boost{ namespace multiprecision
|
||
|
|
||
|
template <class NumberType1, class NumberType2>
|
||
|
struct is_equivalent_number_type;
|
||
|
|
||
|
}
|
||
|
}
|
||
|
|
||
|
When it's `value` const-member value is `true` then the library will treat the types `NumberType1` and `NumberType2` as if
|
||
|
they are interchangeable. This is typically used to optimise memory management by using two types with differing
|
||
|
memory allocation strategies for different roles. Typically, we would be using a type with dymanic memory allocation and a minimal
|
||
|
memory footprint for the main storage type (think large arrays or matrices), but a type with internal storage and no dynamic
|
||
|
allocation (but a larger memory footprint) for a few select calculations.
|
||
|
|
||
|
There are three backends that define this trait by default:
|
||
|
|
||
|
* __cpp_int's, provided the two types differ only in their internal cache size.
|
||
|
* __cpp_bin_float's provided they are of the same precision.
|
||
|
* __mpfr_float_backend's provided they are of the same precision.
|
||
|
|
||
|
In addition, while this feature can be used with expression templates turned off, this feature minimises temporaries
|
||
|
and hence memory allocations when expression template are turned on.
|
||
|
|
||
|
By way of an example, consider the dot product of two vectors of __cpp_int's, our first, fairly trivial
|
||
|
implementation might look like this:
|
||
|
|
||
|
[dot_prod_1]
|
||
|
|
||
|
However, in order to reduce the need for memory allocations when constructing the temporaries needed
|
||
|
for the multiply-and-add operations, we could use an equivalent type with a larger internal cache like this:
|
||
|
|
||
|
[dot_prod_2]
|
||
|
|
||
|
Before we compare performance though, there is one other obvious thing we could try. By simply declaring
|
||
|
a variable for the result of the intermediate multiplications, and reusing that variable each time through
|
||
|
the loop, we might also expect to greatly reduce the number of allocations required.
|
||
|
|
||
|
[dot_prod_3]
|
||
|
|
||
|
We'll begin by comparing how many actual allocations were required to calculate the dot product of 1000
|
||
|
value vectors for random data with various bit counts:
|
||
|
|
||
|
[table
|
||
|
[[Bit Count][Allocations Count Version 1][Allocations Count Version 2][Allocations Count Version 3]]
|
||
|
[[32][1[footnote Here everything fits within __cpp_int's default internal cache, so no allocation are required.]][0][0]]
|
||
|
[[64][1001][1[footnote A single allocation for the return value.]][1]]
|
||
|
[[128][1002][1][2]]
|
||
|
[[256][1002][1][3[footnote Here the input data is such that more than one allocation is required for the temporary.]]]
|
||
|
[[512][1002][1][3]]
|
||
|
[[1024][1002][1001[footnote At this point we exceed the internal cache of our internal calculation type.]][3]]
|
||
|
]
|
||
|
|
||
|
Timings for the three methods are as follows (MSVC-16.8.0, x64):
|
||
|
|
||
|
[table
|
||
|
[[Bit Count][time/ms Version 1][time/ms Version 2][time/ms Version 3]]
|
||
|
[[32][0.021][0.021][0.021]]
|
||
|
[[64][0.032][0.032][0.029]]
|
||
|
[[128][0.099][0.041][0.041]]
|
||
|
[[256][0.154][0.091][0.094]]
|
||
|
[[512][0.323][0.270][0.269]]
|
||
|
[[1024][0.998][0.995][0.949]]
|
||
|
]
|
||
|
|
||
|
As you can see, there is a sweet spot for middling-sized integers where we gain: if the values are small, then
|
||
|
__cpp_int's own internal cache is large enough anyway, and no allocation occur. Conversely, if the values are
|
||
|
sufficiently large, then the cost of the actual arithmetic dwarfs the memory allocation time. In this particular
|
||
|
case, carefully writing the code (version 3) is clearly at least as good as using a separate type with a larger cache.
|
||
|
However, there may be times when it's not practical to re-write existing code, purely to optimise it for the
|
||
|
multiprecision use case.
|
||
|
|
||
|
A typical example where we can't rewrite our code to avoid unnecessary allocations, occurs when we're calling an
|
||
|
external routine. For example the arc length of an ellipse with radii ['a] and ['b] is given by:
|
||
|
|
||
|
[pre L(a, b) = 4aE(k)]
|
||
|
|
||
|
with:
|
||
|
|
||
|
[pre k = [sqrt](1 - b[super 2]/a[super 2])]
|
||
|
|
||
|
where ['E(k)] is the complete elliptic integral of the second kind, which is available as a template function `ellint_2` in Boost.Math.
|
||
|
|
||
|
Naively, we might implement this for use with __mpfr_float_backend like this:
|
||
|
|
||
|
[elliptic_arc1]
|
||
|
|
||
|
But we might also try mixing our arithmetic types - regular dynamically allocated __mpfr_float_backend's for the
|
||
|
interface to minimise memory footprint in our external storage, and statically allocated __mpfr_float_backend's
|
||
|
for the internal arithmetic:
|
||
|
|
||
|
[elliptic_arc2]
|
||
|
|
||
|
The performance comparisons are surprisingly stark:
|
||
|
|
||
|
[table
|
||
|
[[N][`number<mpfr_float_backend<N>>` / ms][`number<mpfr_float_backend<N, allocate_stack>>` / ms]]
|
||
|
[[30][19.5][3.1]]
|
||
|
[[40][12.5][6.2]]
|
||
|
[[50][14.4][6.6]]
|
||
|
[[60][18.0][9.5]]
|
||
|
[[70][18.0][9.6]]
|
||
|
[[80][20.0][12.8]]
|
||
|
]
|
||
|
|
||
|
As before, the results are for MSVC-16.8.0/x64, and in point of fact, the results do not always favour
|
||
|
non-allocating types so much, it does depend very much on the special function being called and/or the arguments used.
|
||
|
|
||
|
[h4 Backends With Optimized Mixed Precision Arithmetic]
|
||
|
|
||
|
The following backends have at least some direct support for mixed-precision arithmetic,
|
||
|
and therefore avoid creating unnecessary temporaries when using the interfaces above.
|
||
|
Therefore when using these types it's more efficient to use mixed-precision arithmetic,
|
||
|
than it is to explicitly cast the operands to the result type:
|
||
|
|
||
|
__mpfr_float_backend, __mpf_float, __cpp_int.
|
||
|
|
||
|
[endsect] [/section:mixed Mixed Precision Arithmetic]
|