
Mathias Gaunard wrote:
Stefan Seefeld wrote:
I would appreciate if anybody interested into a future boost.xml submission would have a look, provide feedback, or even get involved into the (ongoing) development.
I have to admit I am a bit confused by the interface.
For example, the first thing I see is that all nodes are actually pointers.
What's the best solution to this, and the memory management issues in general? It's difficult, because a node could be a simple text node with just one character in it, or it could be the root of a multi-megabyte document. Clearly in the first case it's OK to copy the data, while in the second case you would want to avoid deep copying in some way, e.g. with smart pointers and/or some sort of copy-on-write system. (Thinking aloud) I can imagine a tree where the top levels have different types from the bottom ones. I have just dug out an old email that I wrote to the xmlwrapp list in 2004. I had observed that xmlwrapp, which is a C++ wrapper around libxml2 with much in common with Stefan's proposal, had an overhead of about 150 bytes per node. I had been getting binary data from a database (with negligible space overhead compared to a C++ struct), converting it to in-memory XML and then converting that to HTML or SVG using XSLT. The in-memory XML was terribly bloated to the extent that it was the bottleneck in the system, for a couple of reasons. Firstly, as noted above, the data structures had a significant overhead (150 bytes for the XML node compares to a 30 byte per element overhead in a std::list<std::string>). And secondly there's the overhead for self-description i.e. all the strings making up the tag and attribute names. This second issue could be helped if a symbol-table approach were used (e.g. the Boost.Spirit symbols class; each n-byte string would be reduced to hopefully one pointer). The view I'm coming to is that I'd prefer an XML class with enough template parameters to make these sorts of issues resolvable by the user, depending on the requirements of their application: struct node {}; template <typename STR_T = std::string> struct textnode: public node, public STR_T { .... }; template <typename STR_T = std::string> struct comment: public node, public STR_T { .... }; template <typename TAG_NAME_T = std::string, typename ATTR_NAME_T = std::string, typename ATTR_VAL_T = std::string, typename CHILD_T = boost::shared_ptr<node> > struct element: public node { TAG_NAME_T name; std::map<ATTR_NAME_T,ATTR_VAL_T> attributes; std::list<CHILD_T> children; }; You would then need parser and writer functions that would work with any suitable string and child-pointer types. I don't know whether the CHILD_T parameter can be used to support both pointers-to-children and inline-children, nor whether a copy_on_write_ptr<T> could somehow fit into this. Any thoughts? I'm also interested to hear what people think about XPath. I have not used it much outside of XSLT. My feeling is that it is easy to write "//foo" to find all the <foo> elements, but that this will descend all the way to the leaves of your tree even if you know that <foo> elements can only appear in certain contexts near the root. To do efficient searches you either need a set of index structures (like XLST's keys, or Boost.MultiIndex), or a kind of tree-traversal-iterator that lets you be a bit more selective about when to descend into a subtree and when to skip it. Regards, Phil.