117 lines
14 KiB
HTML
117 lines
14 KiB
HTML
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
|
<title>Reciprocal square root</title>
|
|
<link rel="stylesheet" href="../../math.css" type="text/css">
|
|
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
|
<link rel="home" href="../../index.html" title="Math Toolkit 3.0.0">
|
|
<link rel="up" href="../powers.html" title="Basic Functions">
|
|
<link rel="prev" href="ct_pow.html" title="Compile Time Power of a Runtime Base">
|
|
<link rel="next" href="../sinc.html" title="Sinus Cardinal and Hyperbolic Sinus Cardinal Functions">
|
|
</head>
|
|
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
|
|
<table cellpadding="2" width="100%"><tr>
|
|
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
|
|
<td align="center"><a href="../../../../../../index.html">Home</a></td>
|
|
<td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
|
|
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
|
|
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
|
|
<td align="center"><a href="../../../../../../more/index.htm">More</a></td>
|
|
</tr></table>
|
|
<hr>
|
|
<div class="spirit-nav">
|
|
<a accesskey="p" href="ct_pow.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../powers.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../sinc.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
|
|
</div>
|
|
<div class="section">
|
|
<div class="titlepage"><div><div><h3 class="title">
|
|
<a name="math_toolkit.powers.rsqrt"></a><a class="link" href="rsqrt.html" title="Reciprocal square root">Reciprocal square root</a>
|
|
</h3></div></div></div>
|
|
<h5>
|
|
<a name="math_toolkit.powers.rsqrt.h0"></a>
|
|
<span class="phrase"><a name="math_toolkit.powers.rsqrt.synopsis"></a></span><a class="link" href="rsqrt.html#math_toolkit.powers.rsqrt.synopsis">Synopsis</a>
|
|
</h5>
|
|
<pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">math</span><span class="special">/</span><span class="identifier">special_functions</span><span class="special">/</span><span class="identifier">rsqrt</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span>
|
|
|
|
<span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span> <span class="special">{</span>
|
|
|
|
<span class="keyword">template</span><span class="special"><</span><span class="keyword">class</span> <span class="identifier">Real</span><span class="special">></span>
|
|
<span class="identifier">Real</span> <span class="identifier">rsqrt</span><span class="special">(</span><span class="identifier">Real</span> <span class="keyword">const</span> <span class="special">&</span> <span class="identifier">x</span><span class="special">);</span>
|
|
|
|
<span class="special">}</span> <span class="comment">// namespaces</span>
|
|
</pre>
|
|
<p>
|
|
The function <code class="computeroutput"><span class="identifier">rsqrt</span></code> computes
|
|
the reciprocal square root 1/√<span class="emphasis"><em>x</em></span>. Those in the game programming
|
|
community might suspect this is a fast, low precision wrapper around the
|
|
<a href="https://www.felixcloutier.com/x86/rsqrtss" target="_top">rsqrtss</a> instruction.
|
|
This is not correct: We <span class="emphasis"><em>tried</em></span> this instruction, but
|
|
found no performance benefit to using it. However, the <span class="emphasis"><em>trick</em></span>
|
|
of computing a low precision reciprocal square root and then bootstrapping
|
|
to higher precision via Newton's method <span class="emphasis"><em>does</em></span> work, but
|
|
it only yields a performance benefit for quad and higher precision. We do
|
|
of course allow you to use <code class="computeroutput"><span class="identifier">rsqrt</span></code>
|
|
for <code class="computeroutput"><span class="keyword">float</span></code>, <code class="computeroutput"><span class="keyword">double</span></code>,
|
|
and <code class="computeroutput"><span class="keyword">long</span> <span class="keyword">double</span></code>,
|
|
but be aware there is no performance benefit to doing so. However, the savings
|
|
for quad precision and higher are very significant.
|
|
</p>
|
|
<p>
|
|
The use is
|
|
</p>
|
|
<pre class="programlisting"><span class="keyword">using</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">multiprecision</span><span class="special">::</span><span class="identifier">float128</span><span class="special">;</span>
|
|
<span class="identifier">float128</span> <span class="identifier">x</span> <span class="special">=</span> <span class="number">0.1</span><span class="identifier">Q</span><span class="special">;</span>
|
|
<span class="identifier">float128</span> <span class="identifier">y</span> <span class="special">=</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">rsqrt</span><span class="special">(</span><span class="identifier">x</span><span class="special">);</span>
|
|
</pre>
|
|
<p>
|
|
The reciprocal square root of +∞ is zero, and the reciprocal square
|
|
root of a NaN is a NaN.
|
|
</p>
|
|
<p>
|
|
<span class="inlinemediaobject"><object type="image/svg+xml" data="../../../graphs/rsqrt_quad_0_100.svg"></object></span>
|
|
</p>
|
|
<p>
|
|
Performance:
|
|
</p>
|
|
<pre class="programlisting"><span class="identifier">Running</span> <span class="special">./</span><span class="identifier">reporting</span><span class="special">/</span><span class="identifier">performance</span><span class="special">/</span><span class="identifier">rsqrt_performance</span><span class="special">.</span><span class="identifier">x</span>
|
|
<span class="identifier">Run</span> <span class="identifier">on</span> <span class="special">(</span><span class="number">16</span> <span class="identifier">X</span> <span class="number">4300</span> <span class="identifier">MHz</span> <span class="identifier">CPU</span> <span class="identifier">s</span><span class="special">)</span>
|
|
<span class="identifier">CPU</span> <span class="identifier">Caches</span><span class="special">:</span>
|
|
<span class="identifier">L1</span> <span class="identifier">Data</span> <span class="number">32</span> <span class="identifier">KiB</span> <span class="special">(</span><span class="identifier">x8</span><span class="special">)</span>
|
|
<span class="identifier">L1</span> <span class="identifier">Instruction</span> <span class="number">32</span> <span class="identifier">KiB</span> <span class="special">(</span><span class="identifier">x8</span><span class="special">)</span>
|
|
<span class="identifier">L2</span> <span class="identifier">Unified</span> <span class="number">1024</span> <span class="identifier">KiB</span> <span class="special">(</span><span class="identifier">x8</span><span class="special">)</span>
|
|
<span class="identifier">L3</span> <span class="identifier">Unified</span> <span class="number">11264</span> <span class="identifier">KiB</span> <span class="special">(</span><span class="identifier">x1</span><span class="special">)</span>
|
|
<span class="identifier">Load</span> <span class="identifier">Average</span><span class="special">:</span> <span class="number">0.43</span><span class="special">,</span> <span class="number">0.49</span><span class="special">,</span> <span class="number">0.46</span>
|
|
<span class="special">----------------------------------------------------------------------------------</span>
|
|
<span class="identifier">Benchmark</span> <span class="identifier">Time</span> <span class="identifier">CPU</span> <span class="identifier">Iterations</span>
|
|
<span class="special">----------------------------------------------------------------------------------</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="keyword">float</span><span class="special">></span> <span class="number">1.35</span> <span class="identifier">ns</span> <span class="number">1.35</span> <span class="identifier">ns</span> <span class="number">503364351</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="keyword">double</span><span class="special">></span> <span class="number">2.25</span> <span class="identifier">ns</span> <span class="number">2.25</span> <span class="identifier">ns</span> <span class="number">309753242</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="keyword">long</span> <span class="keyword">double</span><span class="special">></span> <span class="number">2.68</span> <span class="identifier">ns</span> <span class="number">2.68</span> <span class="identifier">ns</span> <span class="number">261382652</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="identifier">float128</span><span class="special">></span> <span class="number">182</span> <span class="identifier">ns</span> <span class="number">182</span> <span class="identifier">ns</span> <span class="number">3756956</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="identifier">number</span><span class="special"><</span><span class="identifier">mpfr_float_backend</span><span class="special"><</span><span class="number">100</span><span class="special">>>></span> <span class="number">299</span> <span class="identifier">ns</span> <span class="number">299</span> <span class="identifier">ns</span> <span class="number">2494027</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="identifier">number</span><span class="special"><</span><span class="identifier">mpfr_float_backend</span><span class="special"><</span><span class="number">200</span><span class="special">>>></span> <span class="number">412</span> <span class="identifier">ns</span> <span class="number">412</span> <span class="identifier">ns</span> <span class="number">1589284</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="identifier">number</span><span class="special"><</span><span class="identifier">mpfr_float_backend</span><span class="special"><</span><span class="number">300</span><span class="special">>>></span> <span class="number">617</span> <span class="identifier">ns</span> <span class="number">617</span> <span class="identifier">ns</span> <span class="number">1067473</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="identifier">number</span><span class="special"><</span><span class="identifier">mpfr_float_backend</span><span class="special"><</span><span class="number">400</span><span class="special">>>></span> <span class="number">812</span> <span class="identifier">ns</span> <span class="number">812</span> <span class="identifier">ns</span> <span class="number">830564</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="identifier">number</span><span class="special"><</span><span class="identifier">mpfr_float_backend</span><span class="special"><</span><span class="number">1000</span><span class="special">>>></span> <span class="number">3183</span> <span class="identifier">ns</span> <span class="number">3183</span> <span class="identifier">ns</span> <span class="number">221079</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="identifier">cpp_bin_float_50</span><span class="special">></span> <span class="number">4321</span> <span class="identifier">ns</span> <span class="number">4321</span> <span class="identifier">ns</span> <span class="number">163243</span>
|
|
<span class="identifier">Rsqrt</span><span class="special"><</span><span class="identifier">cpp_bin_float_100</span><span class="special">></span> <span class="number">9393</span> <span class="identifier">ns</span> <span class="number">9393</span> <span class="identifier">ns</span> <span class="number">72967</span>
|
|
</pre>
|
|
</div>
|
|
<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
|
|
<td align="left"></td>
|
|
<td align="right"><div class="copyright-footer">Copyright © 2006-2021 Nikhar Agrawal, Anton Bikineev, Matthew Borland,
|
|
Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno
|
|
Lalande, John Maddock, Evan Miller, Jeremy Murphy, Matthew Pulver, Johan Råde,
|
|
Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg, Daryle
|
|
Walker and Xiaogang Zhang<p>
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
|
|
</p>
|
|
</div></td>
|
|
</tr></table>
|
|
<hr>
|
|
<div class="spirit-nav">
|
|
<a accesskey="p" href="ct_pow.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../powers.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../sinc.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
|
|
</div>
|
|
</body>
|
|
</html>
|