boost/libs/nowide/doc/html/index.html
2021-10-05 21:37:46 +02:00

235 lines
31 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.15"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Boost.Nowide: Boost.Nowide</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectalign" style="padding-left: 0.5em;">
<div id="projectname">Boost.Nowide
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.15 -->
<script type="text/javascript" src="menudata.js"></script>
<script type="text/javascript" src="menu.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&amp;dn=gpl-2.0.txt GPL-v2 */
$(function() {
initMenu('',false,false,'search.php','Search');
});
/* @license-end */</script>
<div id="main-nav"></div>
</div><!-- top -->
<div class="PageDoc"><div class="header">
<div class="headertitle">
<div class="title">Boost.Nowide </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p><a class="el" href="changelog_page.html">Changelog</a></p>
<p>Table of Contents:</p>
<ul>
<li><a class="el" href="index.html#main">What is Boost.Nowide</a></li>
<li><a class="el" href="index.html#main_rationale">Rationale</a><ul>
<li><a class="el" href="index.html#main_the_problem">The Problem</a></li>
<li><a class="el" href="index.html#main_the_solution">The Solution</a></li>
<li><a class="el" href="index.html#main_wide">Why Not Narrow and Wide?</a></li>
<li><a class="el" href="index.html#alternative">Alternatives</a></li>
<li><a class="el" href="index.html#main_reading">Further Reading</a></li>
</ul>
</li>
<li><a class="el" href="index.html#using">Using The Library</a><ul>
<li><a class="el" href="index.html#using_standard">Standard Features</a></li>
<li><a class="el" href="index.html#using_custom">Custom API</a></li>
<li><a class="el" href="index.html#using_integration">Integration with Boost.Filesystem</a></li>
</ul>
</li>
<li><a class="el" href="index.html#technical">Technical Details</a><ul>
<li><a class="el" href="index.html#technical_imple">Windows vs POSIX</a></li>
<li><a class="el" href="index.html#technical_cio">Console I/O</a></li>
</ul>
</li>
<li><a class="el" href="index.html#qna">Q &amp; A</a></li>
<li><a class="el" href="index.html#standalone_version">Standalone Version</a></li>
<li><a class="el" href="index.html#sources">Sources and Downloads</a></li>
</ul>
<h1><a class="anchor" id="main"></a>
What is Boost.Nowide</h1>
<p>Boost.Nowide is a library originally implemented by Artyom Beilis that makes cross platform Unicode aware programming easier.</p>
<p>The library provides an implementation of standard C and C++ library functions, such that their inputs are UTF-8 aware on Windows without requiring to use the Wide API. On Non-Windows/POSIX platforms the StdLib equivalents are aliased instead, so no conversion is performed there as UTF-8 is commonly used already.</p>
<p>Hence you can use the Boost.Nowide functions with the same name as their std counterparts with narrow strings on all platforms and just have it work.</p>
<h1><a class="anchor" id="main_rationale"></a>
Rationale</h1>
<h2><a class="anchor" id="main_the_problem"></a>
The Problem</h2>
<p>Consider a simple application that splits a big file into chunks, such that they can be sent by e-mail. It requires doing a few very simple tasks:</p>
<ul>
<li>Access command line arguments: <code>int main(int argc,char **argv)</code></li>
<li>Open an input file, open several output files: <code>std::fstream::open(const char*,std::ios::openmode m)</code></li>
<li>Remove the files in case of fault: <code>std::remove(const char* file)</code></li>
<li>Print a progress report onto the console: <code>std::cout &lt;&lt; file_name </code></li>
</ul>
<p>Unfortunately it is impossible to implement this simple task in plain C++ if the file names contain non-ASCII characters.</p>
<p>The simple program that uses the API would work on the systems that use UTF-8 internally &ndash; the vast majority of Unix-Line operating systems: Linux, Mac OS X, Solaris, BSD. But it would fail on files like <code>War and Peace - Война и мир - מלחמה ושלום.zip</code> under Microsoft Windows because the native Windows Unicode aware API is Wide-API &ndash; UTF-16.</p>
<p>This incredibly trivial task is very hard to implement in a cross platform manner.</p>
<h2><a class="anchor" id="main_the_solution"></a>
The Solution</h2>
<p>Boost.Nowide provides a set of standard library functions that are UTF-8 aware on Windows and make Unicode aware programming easier.</p>
<p>The library provides:</p>
<ul>
<li>Easy to use functions for converting UTF-8 to/from UTF-16</li>
<li>A class to make the <code>argc</code>, <code>argc</code> and <code>env</code> parameters of <code>main</code> use UTF-8</li>
<li>UTF-8 aware functions<ul>
<li><code>cstdio</code> functions:<ul>
<li><code>fopen</code> </li>
<li><code>freopen</code> </li>
<li><code>remove</code> </li>
<li><code>rename</code> </li>
</ul>
</li>
<li><code>cstdlib</code> functions:<ul>
<li><code>system</code> </li>
<li><code>getenv</code> </li>
<li><code>setenv</code> </li>
<li><code>unsetenv</code> </li>
<li><code>putenv</code> </li>
</ul>
</li>
<li><code>fstream</code> <ul>
<li><code>filebuf</code> </li>
<li><code>fstream/ofstream/ifstream</code> </li>
</ul>
</li>
<li><code>iostream</code> <ul>
<li><code>cout</code> </li>
<li><code>cerr</code> </li>
<li><code>clog</code> </li>
<li><code>cin</code> </li>
</ul>
</li>
</ul>
</li>
</ul>
<p>All these functions are available in Boost.Nowide in headers of the same name. So instead of including <code>cstdio</code> and using <code>std::fopen</code> you simply include <code><a class="el" href="cstdio_8hpp_source.html">boost/nowide/cstdio.hpp</a></code> and use <code><a class="el" href="namespaceboost_1_1nowide.html#a01e092c8625664d7b84db4492a4babc9" title="Same as fopen but file_name and mode are UTF-8 strings.">boost::nowide::fopen</a></code>. The functions accept the same arguments as their <code>std</code> counterparts, in fact on non-Windows builds they are just aliases for those. But on Windows Boost.Nowide does its magic: The narrow string arguments are interpreted as UTF-8, converted to wide strings (UTF-16) and passed to the wide API which handles special chars correctly.</p>
<p>If there are non-UTF-8 characters in the passed string, the conversion will replace them by a replacement character (default: <code>U+FFFD</code>) similar to what the NT kernel does. This means invalid UTF-8 sequences will not roundtrip from narrow-&gt;wide-&gt;narrow resulting in e.g. failure to open a file if the filename is ilformed.</p>
<h2><a class="anchor" id="main_wide"></a>
Why Not Narrow and Wide?</h2>
<p>Why not provide both Wide and Narrow implementations so the developer can choose to use Wide characters on Unix-like platforms?</p>
<p>Several reasons:</p>
<ul>
<li><code>wchar_t</code> is not really portable, it can be 2 bytes, 4 bytes or even 1 byte making Unicode aware programming harder</li>
<li>The C and C++ standard libraries use narrow strings for OS interactions. This library follows the same general rule. There is no such thing as <code>fopen(const wchar_t*, const wchar_t*)</code> in the standard library, so it is better to stick to the standards rather than re-implement Wide API in "Microsoft Windows Style"</li>
</ul>
<h2><a class="anchor" id="alternative"></a>
Alternatives</h2>
<p>Since the May 2019 update Windows 10 does support UTF-8 for narrow strings via a manifest file. So setting "UTF-8" as the active code page would allow using the narrow API without any other changes with UTF-8 encoded strings. See <a href="https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page">the documentation</a> for details.</p>
<p>Since April 2018 there is a (Beta) function available in Windows 10 to use UTF-8 code pages by default via a user setting.</p>
<p>Both methods do work but have a major drawback: They are not fully reliable for the app developer. The code page via manifest method falls back to a legacy code page when an older Windows version than 1903 is used. Hence it is only usable if the targetted system is Windows 10 after May 2019.</p>
<p>The second method relies on user interaction prior to starting the program. Obviously this is not reliable when expecting only UTF-8 in the code.</p>
<p>Hence under some circumstances (and hopefully always somewhen in the future) this library will not be required and even Windows I/O can be used with UTF-8 encoded text.</p>
<h2><a class="anchor" id="main_reading"></a>
Further Reading</h2>
<ul>
<li><a href="http://www.utf8everywhere.org/">www.utf8everywhere.org</a></li>
<li><a href="http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/">Windows console I/O approaches</a></li>
</ul>
<h1><a class="anchor" id="using"></a>
Using The Library</h1>
<h2><a class="anchor" id="using_standard"></a>
Standard Features</h2>
<p>As a developer you are expected to use <code><a class="el" href="namespaceboost_1_1nowide.html" title="This namespace includes implementations of the standard library functions and classes such that they ...">boost::nowide</a></code> functions instead of the functions available in the <code>std</code> namespace.</p>
<p>For example, here is a Unicode unaware implementation of a line counter: </p><div class="fragment"><div class="line"><span class="preprocessor">#include &lt;fstream&gt;</span></div><div class="line"><span class="preprocessor">#include &lt;iostream&gt;</span></div><div class="line"></div><div class="line"><span class="keywordtype">int</span> main(<span class="keywordtype">int</span> argc,<span class="keywordtype">char</span> **argv)</div><div class="line">{</div><div class="line"> <span class="keywordflow">if</span>(argc!=2) {</div><div class="line"> std::cerr &lt;&lt; <span class="stringliteral">&quot;Usage: file_name&quot;</span> &lt;&lt; std::endl;</div><div class="line"> <span class="keywordflow">return</span> 1;</div><div class="line"> }</div><div class="line"></div><div class="line"> std::ifstream f(argv[1]);</div><div class="line"> <span class="keywordflow">if</span>(!f) {</div><div class="line"> std::cerr &lt;&lt; <span class="stringliteral">&quot;Can&#39;t open &quot;</span> &lt;&lt; argv[1] &lt;&lt; std::endl;</div><div class="line"> <span class="keywordflow">return</span> 1;</div><div class="line"> }</div><div class="line"> <span class="keywordtype">int</span> total_lines = 0;</div><div class="line"> <span class="keywordflow">while</span>(f) {</div><div class="line"> <span class="keywordflow">if</span>(f.get() == <span class="charliteral">&#39;\n&#39;</span>)</div><div class="line"> total_lines++;</div><div class="line"> }</div><div class="line"> f.close();</div><div class="line"> std::cout &lt;&lt; <span class="stringliteral">&quot;File &quot;</span> &lt;&lt; argv[1] &lt;&lt; <span class="stringliteral">&quot; has &quot;</span> &lt;&lt; total_lines &lt;&lt; <span class="stringliteral">&quot; lines&quot;</span> &lt;&lt; std::endl;</div><div class="line"> <span class="keywordflow">return</span> 0;</div><div class="line">}</div></div><!-- fragment --><p>To make this program handle Unicode properly, we do the following changes:</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include &lt;boost/nowide/args.hpp&gt;</span></div><div class="line"><span class="preprocessor">#include &lt;boost/nowide/fstream.hpp&gt;</span></div><div class="line"><span class="preprocessor">#include &lt;boost/nowide/iostream.hpp&gt;</span></div><div class="line"></div><div class="line"><span class="keywordtype">int</span> main(<span class="keywordtype">int</span> argc,<span class="keywordtype">char</span> **argv)</div><div class="line">{</div><div class="line"> <a class="code" href="classboost_1_1nowide_1_1args.html">boost::nowide::args</a> a(argc,argv); <span class="comment">// Fix arguments - make them UTF-8</span></div><div class="line"> <span class="keywordflow">if</span>(argc!=2) {</div><div class="line"> <a class="code" href="namespaceboost_1_1nowide.html#ae2fae63f1575764692d8ee82fb5a30ee">boost::nowide::cerr</a> &lt;&lt; <span class="stringliteral">&quot;Usage: file_name&quot;</span> &lt;&lt; std::endl; <span class="comment">// Unicode aware console</span></div><div class="line"> <span class="keywordflow">return</span> 1;</div><div class="line"> }</div><div class="line"></div><div class="line"> <a class="code" href="classboost_1_1nowide_1_1basic__ifstream.html">boost::nowide::ifstream</a> f(argv[1]); <span class="comment">// argv[1] - is UTF-8</span></div><div class="line"> <span class="keywordflow">if</span>(!f) {</div><div class="line"> <span class="comment">// the console can display UTF-8</span></div><div class="line"> <a class="code" href="namespaceboost_1_1nowide.html#ae2fae63f1575764692d8ee82fb5a30ee">boost::nowide::cerr</a> &lt;&lt; <span class="stringliteral">&quot;Can&#39;t open &quot;</span> &lt;&lt; argv[1] &lt;&lt; std::endl;</div><div class="line"> <span class="keywordflow">return</span> 1;</div><div class="line"> }</div><div class="line"> <span class="keywordtype">int</span> total_lines = 0;</div><div class="line"> <span class="keywordflow">while</span>(f) {</div><div class="line"> <span class="keywordflow">if</span>(f.get() == <span class="charliteral">&#39;\n&#39;</span>)</div><div class="line"> total_lines++;</div><div class="line"> }</div><div class="line"> f.close();</div><div class="line"> <span class="comment">// the console can display UTF-8</span></div><div class="line"> <a class="code" href="namespaceboost_1_1nowide.html#ae41f002fbf0b6b1941cc5737f8c36d07">boost::nowide::cout</a> &lt;&lt; <span class="stringliteral">&quot;File &quot;</span> &lt;&lt; argv[1] &lt;&lt; <span class="stringliteral">&quot; has &quot;</span> &lt;&lt; total_lines &lt;&lt; <span class="stringliteral">&quot; lines&quot;</span> &lt;&lt; std::endl;</div><div class="line"> <span class="keywordflow">return</span> 0;</div><div class="line">}</div></div><!-- fragment --><p>This very simple and straightforward approach helps writing Unicode aware programs.</p>
<p>Watch the use of <code><a class="el" href="classboost_1_1nowide_1_1args.html" title="args is a class that temporarily replaces standard main() function arguments with their equal,...">boost::nowide::args</a></code>, <code><a class="el" href="namespaceboost_1_1nowide.html#af8267e8ed3bfb3638a32973b2523861b">boost::nowide::ifstream</a></code> and <code><a class="el" href="namespaceboost_1_1nowide.html#ae2fae63f1575764692d8ee82fb5a30ee" title="Same as std::cerr, but uses UTF-8.">boost::nowide::cerr</a>/cout</code>. On Non-Windows it does nothing, but on Windows the following happens:</p>
<ul>
<li><code><a class="el" href="classboost_1_1nowide_1_1args.html" title="args is a class that temporarily replaces standard main() function arguments with their equal,...">boost::nowide::args</a></code> retrieves UTF-16 arguments from the Windows API, converts them to UTF-8, and temporarily replaces the original <code>argv</code> (and optionally <code>env</code>) with pointers to those internally stored UTF-8 strings for the lifetime of the instance.</li>
<li><code><a class="el" href="namespaceboost_1_1nowide.html#af8267e8ed3bfb3638a32973b2523861b">boost::nowide::ifstream</a></code> converts the passed filename (which is now valid UTF-8) to UTF-16 and calls the Windows Wide API to open the file stream which can then be used as usual.</li>
<li>Similarily <code><a class="el" href="namespaceboost_1_1nowide.html#ae2fae63f1575764692d8ee82fb5a30ee" title="Same as std::cerr, but uses UTF-8.">boost::nowide::cerr</a></code> and <code><a class="el" href="namespaceboost_1_1nowide.html#ae41f002fbf0b6b1941cc5737f8c36d07" title="Same as std::cout, but uses UTF-8.">boost::nowide::cout</a></code> use an underlying stream buffer that converts the UTF-8 string to UTF-16 and use another Wide API function to write it to console.</li>
</ul>
<h2><a class="anchor" id="using_custom"></a>
Custom API</h2>
<p>Of course, this simple set of functions does not cover all needs. If you need to access Wide API from a Windows application that uses UTF-8 internally you can use the functions <code><a class="el" href="namespaceboost_1_1nowide.html#ac19f38bffa0370b15ac63c0480b85351">boost::nowide::widen</a></code> and <code><a class="el" href="namespaceboost_1_1nowide.html#a2c6cc8031fbf04a4d60a50a892d30906">boost::nowide::narrow</a></code>.</p>
<p>For example: </p><div class="fragment"><div class="line">CopyFileW( <a class="code" href="namespaceboost_1_1nowide.html#ac19f38bffa0370b15ac63c0480b85351">boost::nowide::widen</a>(existing_file).c_str(),</div><div class="line"> <a class="code" href="namespaceboost_1_1nowide.html#ac19f38bffa0370b15ac63c0480b85351">boost::nowide::widen</a>(new_file).c_str(),</div><div class="line"> TRUE);</div></div><!-- fragment --><p>The conversion is done at the last stage, and you continue using UTF-8 strings everywhere else. You only switch to the Wide API at glue points.</p>
<p><code><a class="el" href="namespaceboost_1_1nowide.html#ac19f38bffa0370b15ac63c0480b85351">boost::nowide::widen</a></code> returns <code>std::string</code>. Sometimes it is useful to prevent allocation and use on-stack buffers instead. Boost.Nowide provides the <code><a class="el" href="classboost_1_1nowide_1_1basic__stackstring.html" title="A class that allows to create a temporary wide or narrow UTF strings from wide or narrow UTF source.">boost::nowide::basic_stackstring</a></code> class for this purpose.</p>
<p>The example above could be rewritten as:</p>
<div class="fragment"><div class="line"><a class="code" href="classboost_1_1nowide_1_1basic__stackstring.html">boost::nowide::basic_stackstring&lt;wchar_t,char,64&gt;</a> wexisting_file(existing_file), wnew_file(new_file);</div><div class="line">CopyFileW(wexisting_file.c_str(),wnew_file.c_str(),TRUE);</div></div><!-- fragment --><dl class="section note"><dt>Note</dt><dd>There are a few convenience typedefs: <code>stackstring</code> and <code>wstackstring</code> using 256-character buffers, and <code>short_stackstring</code> and <code>wshort_stackstring</code> using 16-character buffers. If the string is longer, they fall back to heap memory allocation.</dd></dl>
<h2><a class="anchor" id="using_windows_h"></a>
The windows.h header</h2>
<p>The library does not include the <code>windows.h</code> in order to prevent namespace pollution with numerous defines and types. Instead, the library defines the prototypes of the Win32 API functions.</p>
<p>However, you may request to use the <code>windows.h</code> header by defining <code>BOOST_USE_WINDOWS_H</code> before including any of the Boost.Nowide headers</p>
<h2><a class="anchor" id="using_integration"></a>
Integration with Boost.Filesystem</h2>
<p>Boost.Filesystem supports selection of narrow encoding. Unfortunatelly the default narrow encoding on Windows isn't UTF-8. But you can enable UTF-8 as default encoding on Boost.Filesystem by calling <code><a class="el" href="namespaceboost_1_1nowide.html#a00544a475ceab4163eb8c12ee80eb9ec">boost::nowide::nowide_filesystem()</a></code> in the beginning of your program which imbues a locale with a UTF-8 conversion facet to convert between <code>char</code> <code>wchar_t</code>. This interprets all narrow strings passed to and from <code>boost::filesystem::path</code> as UTF-8 when converting them to wide strings (as required for internal storage). On POSIX this has usually no effect, as no conversion is done due to narrow strings being used as the storage format.</p>
<h1><a class="anchor" id="technical"></a>
Technical Details</h1>
<h2><a class="anchor" id="technical_imple"></a>
Windows vs POSIX</h2>
<p>For Microsoft Windows, the library provides UTF-8 aware variants of some <code>std:</code>: functions in the <code><a class="el" href="namespaceboost_1_1nowide.html" title="This namespace includes implementations of the standard library functions and classes such that they ...">boost::nowide</a></code> namespace. For example, <code>std::fopen</code> becomes <code><a class="el" href="namespaceboost_1_1nowide.html#a01e092c8625664d7b84db4492a4babc9" title="Same as fopen but file_name and mode are UTF-8 strings.">boost::nowide::fopen</a></code>.</p>
<p>Under POSIX platforms, the functions in <a class="el" href="namespaceboost_1_1nowide.html" title="This namespace includes implementations of the standard library functions and classes such that they ...">boost::nowide</a> are aliases of their standard library counterparts:</p>
<div class="fragment"><div class="line"><span class="keyword">namespace </span>boost {</div><div class="line"><span class="keyword">namespace </span>nowide {</div><div class="line"><span class="preprocessor">#ifdef BOOST_WINDOWS</span></div><div class="line"><span class="keyword">inline</span> FILE *<a class="code" href="namespaceboost_1_1nowide.html#a01e092c8625664d7b84db4492a4babc9">fopen</a>(<span class="keyword">const</span> <span class="keywordtype">char</span>* name, <span class="keyword">const</span> <span class="keywordtype">char</span>* mode)</div><div class="line">{</div><div class="line"> ...</div><div class="line">}</div><div class="line"><span class="preprocessor">#else</span></div><div class="line"><span class="keyword">using</span> std::fopen</div><div class="line"><span class="preprocessor">#endif</span></div><div class="line">} <span class="comment">// nowide</span></div><div class="line">} <span class="comment">// boost</span></div></div><!-- fragment --><p>There is also a <code>std::filebuf</code> compatible implementation provided for Windows which supports UTF-8 filepaths for <code>open</code> and behaves otherwise identical (API-wise).</p>
<p>On all systems the <code>std::fstream</code> class and friends are provided as custom implementations supporting <code>std::string</code> and <code>*::filesystem::path</code> as well as <code>wchar_t*</code> (Windows only) overloads for the constructor and <code>open</code>. This is done so users can use e.g. <code>boost::filesystem::path</code> with <code><a class="el" href="namespaceboost_1_1nowide.html#aec0866ca2b7608db63f9ae7564cae882">boost::nowide::fstream</a></code> without depending on C++17 support. Furthermore any path-like class is supported if it matches the interface of <code>std::filesystem::path</code> "enough".</p>
<p>Note that there is no universal support for <code>path</code> and <code>std::string</code> in <code><a class="el" href="namespaceboost_1_1nowide.html#aa2709df55bd08ca25c97f047afec0ba0" title="Convenience typedef.">boost::nowide::filebuf</a></code>. This is due to using the std variant on non-Windows systems which might be faster in some cases. As <code>filebuf</code> is rarely used by user code but rather indirectly through <code>fstream</code> not having string or path support seems a small price to pay especially as C++11 adds <code>std::string</code> support, C++17 <code>path</code> support and usage via <code>string_or_path.c_str()</code> is still possible and portable.</p>
<h2><a class="anchor" id="technical_cio"></a>
Console I/O</h2>
<p>Console I/O is implemented as a wrapper around ReadConsoleW/WriteConsoleW when the stream goes to the "real" console. When the stream was piped/redirected the standard <code>cin/cout</code> is used instead.</p>
<p>This approach eliminates a need of manual code page handling. If TrueType fonts are used the Unicode aware input and output works as intended.</p>
<h1><a class="anchor" id="qna"></a>
Q &amp; A</h1>
<p><b>Q: What happens to invalid UTF passed through Boost.Nowide? For example Windows using UCS-2 instead of UTF-16.</b></p>
<p>A: The policy of Boost.Nowide is to always yield valid UTF encoded strings. So invalid UTF characters are replaced by the replacement character <code>U+FFFD</code>.</p>
<p>This happens in both directions:<br />
When passing a (presumptly) UTF-8 encoded string to Boost.Nowide it will convert it to UTF-16 and replace every invalid character before passing it to the OS.<br />
On retrieval of a value from the OS (e.g. <code><a class="el" href="namespaceboost_1_1nowide.html#ac83059bedb191bbcb73637cd1224589c" title="UTF-8 aware getenv. Returns 0 if the variable is not set.">boost::nowide::getenv</a></code> or command line arguments through <code><a class="el" href="classboost_1_1nowide_1_1args.html" title="args is a class that temporarily replaces standard main() function arguments with their equal,...">boost::nowide::args</a></code>) the value is assumed to be UTF-16 and converted to UTF-8 replacing any invalid character.</p>
<p>This means that if one somehow manages to create an invalid UTF-16 filename in Windows it will be <b>impossible</b> to handle it with Boost.Nowide. But as Microsoft switched from UCS-2 (aka strings with arbitrary 2 Byte values) to UTF-16 in Windows 2000 it won't be a problem in most environments.</p>
<p><b>Q: What kind of error reporting is used?</b></p>
<p>A: There are in fact 3:</p>
<ul>
<li>Invalid UTF encoded strings are used by replacing invalid chars by the replacement character U+FFFD</li>
<li>API calls mirroring the standard API use the same error reporting as that, e.g. by returning a non-zero value on failure</li>
<li>Non-continuable errors are reported by standard exceptions. Main example is failure to get the command line parameters via the WinAPI</li>
</ul>
<p><b>Q: Why doesn't the library convert the string to/from the locale's encoding (instead of UTF-8) on POSIX systems?</b></p>
<p>A: It is inherently incorrect to convert strings to/from locale encodings on POSIX platforms.</p>
<p>You can create a file named "\xFF\xFF.txt" (invalid UTF-8), remove it, pass its name as a parameter to a program and it would work whether the current locale is UTF-8 or not. Also, changing the locale from let's say <code>en_US.UTF-8</code> to <code>en_US.ISO-8859-1</code> would not magically change all files in the OS or the strings a user may pass to the program (which is different on Windows)</p>
<p>POSIX OSs treat strings as <code>NULL</code> terminated cookies.</p>
<p>So altering their content according to the locale would actually lead to incorrect behavior.</p>
<p>For example, this is a naive implementation of a standard program "rm"</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include &lt;cstdio&gt;</span></div><div class="line"></div><div class="line"><span class="keywordtype">int</span> main(<span class="keywordtype">int</span> argc,<span class="keywordtype">char</span> **argv)</div><div class="line">{</div><div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> i=1;i&lt;argc;i++)</div><div class="line"> std::remove(argv[i]);</div><div class="line"> <span class="keywordflow">return</span> 0;</div><div class="line">}</div></div><!-- fragment --><p>It would work with ANY locale and changing the strings would lead to incorrect behavior.</p>
<p>The meaning of a locale under POSIX and Windows platforms is different and has very different effects.</p>
<h2><a class="anchor" id="standalone_version"></a>
Standalone Version</h2>
<p>It is possible to use Nowide library without having the huge Boost project as a dependency. There is a standalone version that has all the functionality in the <code>nowide</code> namespace instead of <code><a class="el" href="namespaceboost_1_1nowide.html" title="This namespace includes implementations of the standard library functions and classes such that they ...">boost::nowide</a></code>. The example above would look like</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include &lt;nowide/args.hpp&gt;</span></div><div class="line"><span class="preprocessor">#include &lt;nowide/fstream.hpp&gt;</span></div><div class="line"><span class="preprocessor">#include &lt;nowide/iostream.hpp&gt;</span></div><div class="line"></div><div class="line"><span class="keywordtype">int</span> main(<span class="keywordtype">int</span> argc,<span class="keywordtype">char</span> **argv)</div><div class="line">{</div><div class="line"> nowide::args a(argc,argv); <span class="comment">// Fix arguments - make them UTF-8</span></div><div class="line"> <span class="keywordflow">if</span>(argc!=2) {</div><div class="line"> nowide::cerr &lt;&lt; <span class="stringliteral">&quot;Usage: file_name&quot;</span> &lt;&lt; std::endl; <span class="comment">// Unicode aware console</span></div><div class="line"> <span class="keywordflow">return</span> 1;</div><div class="line"> }</div><div class="line"></div><div class="line"> nowide::ifstream f(argv[1]); <span class="comment">// argv[1] - is UTF-8</span></div><div class="line"> <span class="keywordflow">if</span>(!f) {</div><div class="line"> <span class="comment">// the console can display UTF-8</span></div><div class="line"> nowide::cerr &lt;&lt; <span class="stringliteral">&quot;Can&#39;t open a file &quot;</span> &lt;&lt; argv[1] &lt;&lt; std::endl;</div><div class="line"> <span class="keywordflow">return</span> 1;</div><div class="line"> }</div><div class="line"> <span class="keywordtype">int</span> total_lines = 0;</div><div class="line"> <span class="keywordflow">while</span>(f) {</div><div class="line"> <span class="keywordflow">if</span>(f.get() == <span class="charliteral">&#39;\n&#39;</span>)</div><div class="line"> total_lines++;</div><div class="line"> }</div><div class="line"> f.close();</div><div class="line"> <span class="comment">// the console can display UTF-8</span></div><div class="line"> nowide::cout &lt;&lt; <span class="stringliteral">&quot;File &quot;</span> &lt;&lt; argv[1] &lt;&lt; <span class="stringliteral">&quot; has &quot;</span> &lt;&lt; total_lines &lt;&lt; <span class="stringliteral">&quot; lines&quot;</span> &lt;&lt; std::endl;</div><div class="line"> <span class="keywordflow">return</span> 0;</div><div class="line">}</div></div><!-- fragment --><h2><a class="anchor" id="sources"></a>
Sources and Downloads</h2>
<p>The upstream sources can be found at GitHub: <a href="https://github.com/boostorg/nowide">https://github.com/boostorg/nowide</a></p>
<p>You can download the latest sources there:</p>
<ul>
<li>Standard Version: <a href="https://github.com/boostorg/nowide/archive/master.zip">nowide-master.zip</a> </li>
</ul>
</div></div><!-- PageDoc -->
</div><!-- contents -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated by &#160;<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.15
</small></address>
</body>
</html>