138 lines
7.3 KiB
HTML
138 lines
7.3 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||
<html>
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||
<title>Implementation Rationale</title>
|
||
<link rel="stylesheet" href="../../../doc/src/boostbook.css" type="text/css">
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||
<link rel="home" href="../index.html" title="The Boost C++ Libraries BoostBook Documentation Subset">
|
||
<link rel="up" href="../unordered.html" title="Chapter 45. Boost.Unordered">
|
||
<link rel="prev" href="compliance.html" title="Standard Compliance">
|
||
<link rel="next" href="changes.html" title="Change Log">
|
||
</head>
|
||
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
|
||
<table cellpadding="2" width="100%"><tr>
|
||
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../boost.png"></td>
|
||
<td align="center"><a href="../../../index.html">Home</a></td>
|
||
<td align="center"><a href="../../../libs/libraries.htm">Libraries</a></td>
|
||
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
|
||
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
|
||
<td align="center"><a href="../../../more/index.htm">More</a></td>
|
||
</tr></table>
|
||
<hr>
|
||
<div class="spirit-nav">
|
||
<a accesskey="p" href="compliance.html"><img src="../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../unordered.html"><img src="../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="changes.html"><img src="../../../doc/src/images/next.png" alt="Next"></a>
|
||
</div>
|
||
<div class="section">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="unordered.rationale"></a><a class="link" href="rationale.html" title="Implementation Rationale">Implementation Rationale</a>
|
||
</h2></div></div></div>
|
||
<p>
|
||
The intent of this library is to implement the unordered containers in the
|
||
draft standard, so the interface was fixed. But there are still some implementation
|
||
decisions to make. The priorities are conformance to the standard and portability.
|
||
</p>
|
||
<p>
|
||
The <a href="http://en.wikipedia.org/wiki/Hash_table" target="_top">Wikipedia article
|
||
on hash tables</a> has a good summary of the implementation issues for
|
||
hash tables in general.
|
||
</p>
|
||
<h3>
|
||
<a name="unordered.rationale.h0"></a>
|
||
<span class="phrase"><a name="unordered.rationale.data_structure"></a></span><a class="link" href="rationale.html#unordered.rationale.data_structure">Data
|
||
Structure</a>
|
||
</h3>
|
||
<p>
|
||
By specifying an interface for accessing the buckets of the container the standard
|
||
pretty much requires that the hash table uses chained addressing.
|
||
</p>
|
||
<p>
|
||
It would be conceivable to write a hash table that uses another method. For
|
||
example, it could use open addressing, and use the lookup chain to act as a
|
||
bucket but there are some serious problems with this:
|
||
</p>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
|
||
<li class="listitem">
|
||
The draft standard requires that pointers to elements aren't invalidated,
|
||
so the elements can't be stored in one array, but will need a layer of
|
||
indirection instead - losing the efficiency and most of the memory gain,
|
||
the main advantages of open addressing.
|
||
</li>
|
||
<li class="listitem">
|
||
Local iterators would be very inefficient and may not be able to meet the
|
||
complexity requirements.
|
||
</li>
|
||
<li class="listitem">
|
||
There are also the restrictions on when iterators can be invalidated. Since
|
||
open addressing degrades badly when there are a high number of collisions
|
||
the restrictions could prevent a rehash when it's really needed. The maximum
|
||
load factor could be set to a fairly low value to work around this - but
|
||
the standard requires that it is initially set to 1.0.
|
||
</li>
|
||
<li class="listitem">
|
||
And since the standard is written with a eye towards chained addressing,
|
||
users will be surprised if the performance doesn't reflect that.
|
||
</li>
|
||
</ul></div>
|
||
<p>
|
||
So chained addressing is used.
|
||
</p>
|
||
<h3>
|
||
<a name="unordered.rationale.h1"></a>
|
||
<span class="phrase"><a name="unordered.rationale.number_of_buckets"></a></span><a class="link" href="rationale.html#unordered.rationale.number_of_buckets">Number
|
||
of Buckets</a>
|
||
</h3>
|
||
<p>
|
||
There are two popular methods for choosing the number of buckets in a hash
|
||
table. One is to have a prime number of buckets, another is to use a power
|
||
of 2.
|
||
</p>
|
||
<p>
|
||
Using a prime number of buckets, and choosing a bucket by using the modulus
|
||
of the hash function's result will usually give a good result. The downside
|
||
is that the required modulus operation is fairly expensive. This is what the
|
||
containers do in most cases.
|
||
</p>
|
||
<p>
|
||
Using a power of 2 allows for much quicker selection of the bucket to use,
|
||
but at the expense of losing the upper bits of the hash value. For some specially
|
||
designed hash functions it is possible to do this and still get a good result
|
||
but as the containers can take arbitrary hash functions this can't be relied
|
||
on.
|
||
</p>
|
||
<p>
|
||
To avoid this a transformation could be applied to the hash function, for an
|
||
example see <a href="http://web.archive.org/web/20121102023700/http://www.concentric.net/~Ttwang/tech/inthash.htm" target="_top">Thomas
|
||
Wang's article on integer hash functions</a>. Unfortunately, a transformation
|
||
like Wang's requires knowledge of the number of bits in the hash value, so
|
||
it isn't portable enough to use as a default. It can applicable in certain
|
||
cases so the containers have a policy based implementation that can use this
|
||
alternative technique.
|
||
</p>
|
||
<p>
|
||
Currently this is only done on 64 bit architectures, where prime number modulus
|
||
can be expensive. Although this varies depending on the architecture, so I
|
||
probably should revisit it.
|
||
</p>
|
||
<p>
|
||
I'm also thinking of introducing a mechanism whereby a hash function can indicate
|
||
that it's safe to be used directly with power of 2 buckets, in which case a
|
||
faster plain power of 2 implementation can be used.
|
||
</p>
|
||
</div>
|
||
<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
|
||
<td align="left"></td>
|
||
<td align="right"><div class="copyright-footer">Copyright © 2003, 2004 Jeremy B. Maitin-Shepard<br>Copyright © 2005-2008 Daniel
|
||
James<p>
|
||
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
||
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
|
||
</p>
|
||
</div></td>
|
||
</tr></table>
|
||
<hr>
|
||
<div class="spirit-nav">
|
||
<a accesskey="p" href="compliance.html"><img src="../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../unordered.html"><img src="../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="changes.html"><img src="../../../doc/src/images/next.png" alt="Next"></a>
|
||
</div>
|
||
</body>
|
||
</html>
|