
Daniel Walker wrote:
On 4/24/06, Marcin Kalicinski <kalita@poczta.onet.pl> wrote:
Right. FWIW, I think Dan Nuffer's XML parser is not a hack. The spirit XML parsers implement the full XML grammar.
http://spirit.sourceforge.net/repository/applications/show_contents.php
My knowledge of XML is limited, but I think Dan Nuffer's parser will parse any valid XML. read_xml however discards all that goes beyond nodes, attributes, data and comments.
Isn't the property_tree XML parser originally based on Dan Nuffer's? Couldn't the productions/tokens from the Nuffer parser be added back to read_xml() so that it could at least accept the syntax for all XML files even if it doesn't implement the semantics? I think the runtime overhead of the additional productions in the grammar would be negligible for simple XML files that don't use the features and necessary for XML files that do. It seems to me this could clarify the scope of the parser. The documentation could read something like:
"read_xml() preforms non-validated parsing of the W3C recommendation XML 1.1. In addition, as of version 1.3x, read_xml() parses but ignores the following W3C specifications: XML Names, XInclude, XLink/XPointer, XML Schema, XSLT, ..."
.... changing version numbers as appropriate. Also, it may simplify maintenance as far as pulling bug-fixes/enhancements from the Nuffer parser code-base to property_tree.
Maybe what is needed here is two functions: read_simple_xml(...); read_complex_xml(...); -Thorsten