
Stefan Seefeld wrote:
over the last couple of years we have discussed possible XML APIs for inclusion into boost. As I already had an early prototype for such an API, I kept evolving it, based on feedback from those discussions. A couple of weeks ago I actually checked it into the sandbox (http://svn.boost.org/trac/boost/browser/sandbox/xml).
PS: The current scope of the project is described in http://svn.boost.org/trac/boost/browser/sandbox/xml/README
Hi Stefan, My comments follow; these are based on maybe half an hour looking at your code, but it's quite possible that I have missed something. As others have pointed out, it would be easier to evaluate with some more docs... I certainly agree that C++ would benefit from an XML API and Boost is a good place to develop it. As far as I can see, what you have is a wrapper around the GNOME libxml2 (which has an MIT-license and is cross-platform) that implements something that you call dom, but is not the standardised "DOM" API for XML (http://www.w3.org/DOM/). I think that two C++ APIs for XML document manipulation could be justified: (a) DOM. This has the benefit of being standardised, so you can transfer at least your experience and to some extent actual code from one language to another (e.g. C++ to/from Javascript in my case). On the other hand it is a rather verbose and unenjoyable API that isn't a great match to 'modern' C++. (b) A standard-library-like API (e.g. attributes are a map, child nodes are a sequence). This would have the benefit of familiarity to users of the C++ standard library, and I think it would be a more concise and usable API. As far as I can see, what you have created is something that isn't (a) or (b) but falls somewhere between. For example, you provide iterators rather than the nextSibling-style functions of DOM, but you provide custom functions like append_element and set_attribute rather than standard-library-like append() and operator[] implementations. For example, compare: - DOM: e.setAttribute("color","red"); e.appendChild(doc.createElement("P")); - Yours: e.set_attribute("color","red"); e.append_element("P"); - STL-like: e.attributes["color"]="red"; e.children.push_back(new Element("P")); In the past I have used a library called xmlwrapp. You should take a look at it if you have not done so already. It has a very liberal license (boost-like). It is also a C++ libxml2 wrapper and as I recall its style is similar to yours. It seemed to do nearly everything that I wanted. I remember being confused about the ownership semantics of pointed-to objects sometimes; what is your policy? (e.g. if I copy a subtree to another place in the document, is it a deep copy or a pointer copy? Copy-on-write? When is it freed? Reference counted?) I was also surprised once with the memory inefficiency: you might like to consider how many MB of RAM are needed to store in-memory a document that is X MB on disk, for examples with many small nodes or fewer larger nodes. In my case, it would have helped to use some sort of dictionary for element and attribute names. One thing that xmlwrapp did not offer was a way to access the underlying libxml2 C 'object'. While this is normally an implementation detail that you would like to hide, note that there are other C libraries that you might want to use; I think the one that I was looking at was the SVG renderer librsvg [attn Jake!]. I wanted to build an in-memory XML/SVG document in my C++ code and then convert it to a bitmap, but because xmlwrapp wouldn't let me get at the raw libxml2 stuff, I couldn't, and had to go via a temporary file. (Or maybe I hacked it, can't remember.) Doing XSLT transformations would be another example where this would be necessary. I hope these comments are useful; what do others think? Regards, Phil.