55 lines
3.1 KiB
Plaintext
55 lines
3.1 KiB
Plaintext
[/
|
|
/ Copyright (c) 2008 Marcin Kalicinski (kalita <at> poczta dot onet dot pl)
|
|
/ Copyright (c) 2009 Sebastian Redl (sebastian dot redl <at> getdesigned dot at)
|
|
/
|
|
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
/]
|
|
[section XML Parser]
|
|
[def __xml__ [@http://en.wikipedia.org/wiki/XML XML format]]
|
|
[def __xml_parser.hpp__ [headerref boost/property_tree/xml_parser.hpp xml_parser.hpp]]
|
|
[def __RapidXML__ [@http://rapidxml.sourceforge.net/ RapidXML]]
|
|
[def __boost__ [@http://www.boost.org Boost]]
|
|
The __xml__ is an industry standard for storing information in textual
|
|
form. Unfortunately, there is no XML parser in __boost__ as of the
|
|
time of this writing. The library therefore contains the fast and tiny
|
|
__RapidXML__ parser (currently in version 1.13) to provide XML parsing support.
|
|
RapidXML does not fully support the XML standard; it is not capable of parsing
|
|
DTDs and therefore cannot do full entity substitution.
|
|
|
|
By default, the parser will preserve most whitespace, but remove element content
|
|
that consists only of whitespace. Encoded whitespaces (e.g.  ) does not
|
|
count as whitespace in this regard. You can pass the trim_whitespace flag if you
|
|
want all leading and trailing whitespace trimmed and all continuous whitespace
|
|
collapsed into a single space.
|
|
|
|
Please note that RapidXML does not understand the encoding specification. If
|
|
you pass it a character buffer, it assumes the data is already correctly
|
|
encoded; if you pass it a filename, it will read the file using the character
|
|
conversion of the locale you give it (or the global locale if you give it none).
|
|
This means that, in order to parse a UTF-8-encoded XML file into a wptree, you
|
|
have to supply an alternate locale, either directly or by replacing the global
|
|
one.
|
|
|
|
XML / property tree conversion schema (__read_xml__ and __write_xml__):
|
|
|
|
* Each XML element corresponds to a property tree node. The child elements
|
|
correspond to the children of the node.
|
|
* The attributes of an XML element are stored in the subkey [^<xmlattr>]. There
|
|
is one child node per attribute in the attribute node. Existence of the
|
|
[^<xmlattr>] node is not guaranteed or necessary when there are no attributes.
|
|
* XML comments are stored in nodes named [^<xmlcomment>], unless comment
|
|
ignoring is enabled via the flags.
|
|
* Text content is stored in one of two ways, depending on the flags. The default
|
|
way concatenates all text nodes and stores them as the data of the element
|
|
node. This way, the entire content can be conveniently read, but the
|
|
relative ordering of text and child elements is lost. The other way stores
|
|
each text content as a separate node, all called [^<xmltext>].
|
|
|
|
The XML storage encoding does not round-trip perfectly. A read-write cycle loses
|
|
trimmed whitespace, low-level formatting information, and the distinction
|
|
between normal data and CDATA nodes. Comments are only preserved when enabled.
|
|
A write-read cycle loses trimmed whitespace; that is, if the origin tree has
|
|
string data that starts or ends with whitespace, that whitespace is lost.
|
|
[endsect] [/xml_parser]
|