131 lines
5.1 KiB
Plaintext
131 lines
5.1 KiB
Plaintext
//
|
|
// Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
|
|
//
|
|
// Distributed under the Boost Software License, Version 1.0. (See
|
|
// accompanying file LICENSE_1_0.txt or copy at
|
|
// http://www.boost.org/LICENSE_1_0.txt)
|
|
//
|
|
|
|
// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
|
|
/*!
|
|
\page std_locales Introduction to C++ Standard Library localization support
|
|
|
|
\section std_locales_basics Getting familiar with standard C++ Locales
|
|
|
|
The C++ standard library offers a simple and powerful way to provide locale-specific information. It is done via the \c
|
|
std::locale class, the container that holds all the required information about a specific culture, such as number formatting
|
|
patterns, date and time formatting, currency, case conversion etc.
|
|
|
|
All this information is provided by facets, special classes derived from the \c std::locale::facet base class. Such facets are
|
|
packed into the \c std::locale class and allow you to provide arbitrary information about the locale. The \c std::locale class
|
|
keeps reference counters on installed facets and can be efficiently copied.
|
|
|
|
Each facet that was installed into the \c std::locale object can be fetched using the \c std::use_facet function. For example,
|
|
the \c std::ctype<Char> facet provides rules for case conversion, so you can convert a character to upper-case like this:
|
|
|
|
\code
|
|
std::ctype<char> const &ctype_facet = std::use_facet<std::ctype<char> >(some_locale);
|
|
char upper_a = ctype_facet.toupper('a');
|
|
\endcode
|
|
|
|
A locale object can be imbued into an \c iostream so it would format information according to the locale:
|
|
|
|
\code
|
|
cout.imbue(std::locale("en_US.UTF-8"));
|
|
cout << 1345.45 << endl;
|
|
cout.imbue(std::locale("ru_RU.UTF-8"));
|
|
cout << 1345.45 << endl;
|
|
\endcode
|
|
|
|
Would display:
|
|
|
|
\verbatim
|
|
1,345.45 1.345,45
|
|
\endverbatim
|
|
|
|
You can also create your own facets and install them into existing locale objects. For example:
|
|
|
|
\code
|
|
class measure : public std::locale::facet {
|
|
public:
|
|
typedef enum { inches, ... } measure_type;
|
|
measure(measure_type m,size_t refs=0)
|
|
double from_metric(double value) const;
|
|
std::string name() const;
|
|
...
|
|
};
|
|
\endcode
|
|
And now you can simply provide this information to a locale:
|
|
|
|
\code
|
|
std::locale::global(std::locale(std::locale("en_US.UTF-8"),new measure(measure::inches)));
|
|
/// Create default locale built from en_US locale and add paper size facet.
|
|
\endcode
|
|
|
|
|
|
Now you can print a distance according to the correct locale:
|
|
|
|
\code
|
|
void print_distance(std::ostream &out,double value)
|
|
{
|
|
measure const &m = std::use_facet<measure>(out.getloc());
|
|
// Fetch locale information from stream
|
|
out << m.from_metric(value) << " " << m.name();
|
|
}
|
|
\endcode
|
|
|
|
This technique was adopted by the Boost.Locale library in order to provide powerful and correct localization. Instead of using
|
|
the very limited C++ standard library facets, it uses ICU under the hood to create its own much more powerful ones.
|
|
|
|
\section std_locales_common Common Critical Problems with the Standard Library
|
|
|
|
There are numerous issues in the standard library that prevent the use of its full power, and there are several
|
|
additional issues:
|
|
|
|
- Setting the global locale has bad side effects.
|
|
\n
|
|
Consider following code:
|
|
\n
|
|
\code
|
|
int main()
|
|
{
|
|
std::locale::global(std::locale(""));
|
|
// Set system's default locale as global
|
|
std::ofstream csv("test.csv");
|
|
csv << 1.1 << "," << 1.3 << std::endl;
|
|
}
|
|
\endcode
|
|
\n
|
|
What would be the content of \c test.csv ? It may be "1.1,1.3" or it may be "1,1,1,3"
|
|
rather than what you had expected.
|
|
\n
|
|
More than that it affects even \c printf and libraries like \c boost::lexical_cast giving
|
|
incorrect or unexpected formatting. In fact many third-party libraries are broken in such a
|
|
situation.
|
|
\n
|
|
Unlike the standard localization library, Boost.Locale never changes the basic number formatting,
|
|
even when it uses \c std based localization backends, so by default, numbers are always
|
|
formatted using C-style locale. Localized number formatting requires specific flags.
|
|
\n
|
|
- Number formatting is broken on some locales.
|
|
\n
|
|
Some locales use the non-breakable space u00A0 character for thousands separator, thus
|
|
in \c ru_RU.UTF-8 locale number 1024 should be displayed as "1 024" where the space
|
|
is a Unicode character with codepoint u00A0. Unfortunately many libraries don't handle
|
|
this correctly, for example GCC and SunStudio display a "\xC2" character instead of
|
|
the first character in the UTF-8 sequence "\xC2\xA0" that represents this code point, and
|
|
actually generate invalid UTF-8.
|
|
\n
|
|
- Locale names are not standardized. For example, under MSVC you need to provide the name
|
|
\c en-US or \c English_USA.1252 , when on POSIX platforms it would be \c en_US.UTF-8
|
|
or \c en_US.ISO-8859-1
|
|
\n
|
|
More than that, MSVC does not support UTF-8 locales at all.
|
|
\n
|
|
- Many standard libraries provide only the C and POSIX locales, thus GCC supports localization
|
|
only under Linux. On all other platforms, attempting to create locales other than "C" or
|
|
"POSIX" would fail.
|
|
|
|
*/
|
|
|