231 lines
7.7 KiB
HTML
231 lines
7.7 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Language" content="en-us">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
|
|
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
|
|
<meta name="ProgId" content="FrontPage.Editor.Document">
|
|
|
|
<title>Boost Char Separator</title>
|
|
</head>
|
|
|
|
<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
|
|
"#FF0000">
|
|
<p><img src="../../boost.png" alt="C++ Boost" width="277" height=
|
|
"86"><br></p>
|
|
|
|
<h1>char_separator<Char, Traits></h1>
|
|
|
|
<p>The <tt>char_separator</tt> class breaks a sequence of characters into
|
|
tokens based on character delimiters much in the same way that
|
|
<tt>strtok()</tt> does (but without all the evils of non-reentrancy and
|
|
destruction of the input sequence).</p>
|
|
|
|
<p>The <tt>char_separator</tt> class is used in conjunction with the
|
|
<a href="token_iterator.htm"><tt>token_iterator</tt></a> or <a href=
|
|
"tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing.</p>
|
|
|
|
<h2>Definitions</h2>
|
|
|
|
<p>The <tt>strtok()</tt> function does not include matches with the
|
|
character delimiters in the output sequence of tokens. However, sometimes
|
|
it is useful to have the delimiters show up in the output sequence,
|
|
therefore <tt>char_separator</tt> provides this as an option. We refer to
|
|
delimiters that show up as output tokens as <b><i>kept delimiters</i></b>
|
|
and delimiters that do now show up as output tokens as <b><i>dropped
|
|
delimiters</i></b>.</p>
|
|
|
|
<p>When two delimiters appear next to each other in the input sequence,
|
|
there is the question of whether to output an <b><i>empty token</i></b> or
|
|
to skip ahead. The behaviour of <tt>strtok()</tt> is to skip ahead. The
|
|
<tt>char_separator</tt> class provides both options.</p>
|
|
|
|
<h2>Examples</h2>
|
|
|
|
<p>This first examples shows how to use <tt>char_separator</tt> as a
|
|
replacement for the <tt>strtok()</tt> function. We've specified three
|
|
character delimiters, and they will not show up as output tokens. We have
|
|
not specified any kept delimiters, and by default any empty tokens will be
|
|
ignored.</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
// char_sep_example_1.cpp
|
|
#include <iostream>
|
|
#include <boost/tokenizer.hpp>
|
|
#include <string>
|
|
|
|
int main()
|
|
{
|
|
std::string str = ";;Hello|world||-foo--bar;yow;baz|";
|
|
typedef boost::tokenizer<boost::char_separator<char> >
|
|
tokenizer;
|
|
boost::char_separator<char> sep("-;|");
|
|
tokenizer tokens(str, sep);
|
|
for (tokenizer::iterator tok_iter = tokens.begin();
|
|
tok_iter != tokens.end(); ++tok_iter)
|
|
std::cout << "<" << *tok_iter << "> ";
|
|
std::cout << "\n";
|
|
return EXIT_SUCCESS;
|
|
}
|
|
</pre>
|
|
</blockquote>The output is:
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<Hello> <world> <foo> <bar> <yow> <baz>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>The next example shows tokenizing with two dropped delimiters '-' and
|
|
';' and a single kept delimiter '|'. We also specify that empty tokens
|
|
should show up in the output when two delimiters are next to each
|
|
other.</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
// char_sep_example_2.cpp
|
|
#include <iostream>
|
|
#include <boost/tokenizer.hpp>
|
|
#include <string>
|
|
|
|
int main()
|
|
{
|
|
std::string str = ";;Hello|world||-foo--bar;yow;baz|";
|
|
typedef boost::tokenizer<boost::char_separator<char> >
|
|
tokenizer;
|
|
boost::char_separator<char> sep("-;", "|", boost::keep_empty_tokens);
|
|
tokenizer tokens(str, sep);
|
|
for (tokenizer::iterator tok_iter = tokens.begin();
|
|
tok_iter != tokens.end(); ++tok_iter)
|
|
std::cout << "<" << *tok_iter << "> ";
|
|
std::cout << "\n";
|
|
return EXIT_SUCCESS;
|
|
}
|
|
</pre>
|
|
</blockquote>The output is:
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<> <> <Hello> <|> <world> <|> <> <|> <> <foo> <> <bar> <yow> <baz> <|> <>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>The final example shows tokenizing on punctuation and whitespace
|
|
characters using the default constructor of the
|
|
<tt>char_separator</tt>.</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
// char_sep_example_3.cpp
|
|
#include <iostream>
|
|
#include <boost/tokenizer.hpp>
|
|
#include <string>
|
|
|
|
int main()
|
|
{
|
|
std::string str = "This is, a test";
|
|
typedef boost::tokenizer<boost::char_separator<char> > Tok;
|
|
boost::char_separator<char> sep; // default constructed
|
|
Tok tok(str, sep);
|
|
for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
|
|
std::cout << "<" << *tok_iter << "> ";
|
|
std::cout << "\n";
|
|
return EXIT_SUCCESS;
|
|
}
|
|
</pre>
|
|
</blockquote>The output is:
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<This> <is> <,> <a> <test>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<h2>Template parameters</h2>
|
|
|
|
<table border summary="">
|
|
<tr>
|
|
<th>Parameter</th>
|
|
|
|
<th>Description</th>
|
|
|
|
<th>Default</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><tt>Char</tt></td>
|
|
|
|
<td>The type of elements within a token, typically <tt>char</tt>.</td>
|
|
|
|
<td> </td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><tt>Traits</tt></td>
|
|
|
|
<td>The <tt>char_traits</tt> for the character type.</td>
|
|
|
|
<td><tt>char_traits<char></tt></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h2>Model of</h2><a href="tokenizerfunction.htm">Tokenizer Function</a>
|
|
|
|
<h2>Members</h2>
|
|
<hr>
|
|
<pre>
|
|
explicit char_separator(const Char* dropped_delims,
|
|
const Char* kept_delims = "",
|
|
empty_token_policy empty_tokens = drop_empty_tokens)
|
|
</pre>
|
|
|
|
<p>This creates a <tt>char_separator</tt> object, which can then be used to
|
|
create a <a href="token_iterator.htm"><tt>token_iterator</tt></a> or
|
|
<a href="tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing. The
|
|
<tt>dropped_delims</tt> and <tt>kept_delims</tt> are strings of characters
|
|
where each character is used as delimiter during tokenizing. Whenever a
|
|
delimiter is seen in the input sequence, the current token is finished, and
|
|
a new token begins. The delimiters in <tt>dropped_delims</tt> do not show
|
|
up as tokens in the output whereas the delimiters in <tt>kept_delims</tt>
|
|
do show up as tokens. If <tt>empty_tokens</tt> is
|
|
<tt>drop_empty_tokens</tt>, then empty tokens will not show up in the
|
|
output. If <tt>empty_tokens</tt> is <tt>keep_empty_tokens</tt> then empty
|
|
tokens will show up in the output.</p>
|
|
<hr>
|
|
<pre>
|
|
explicit char_separator()
|
|
</pre>
|
|
|
|
<p>The function <tt>std::isspace()</tt> is used to identify dropped
|
|
delimiters and <tt>std::ispunct()</tt> is used to identify kept delimiters.
|
|
In addition, empty tokens are dropped.</p>
|
|
<hr>
|
|
<pre>
|
|
template <typename InputIterator, typename Token>
|
|
bool operator()(InputIterator& next, InputIterator end, Token& tok)
|
|
</pre>
|
|
|
|
<p>This function is called by the <a href=
|
|
"token_iterator.htm"><tt>token_iterator</tt></a> to perform tokenizing. The
|
|
user typically does not call this function directly.</p>
|
|
<hr>
|
|
|
|
<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
|
|
"../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional"
|
|
height="31" width="88"></a></p>
|
|
|
|
<p>Revised
|
|
<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25
|
|
December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
|
|
|
|
<p><i>Copyright © 2001-2002 Jeremy Siek and John R. Bandela</i></p>
|
|
|
|
<p><i>Distributed under the Boost Software License, Version 1.0. (See
|
|
accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
|
|
copy at <a href=
|
|
"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
|
|
</body>
|
|
</html>
|