Re: [boost] RFC: Boost.XML API prototype in the sandbox

9 Jul 2007

      Stefan Seefeld wrote:
...
over the last couple of years we have discussed possible XML APIs
for inclusion into boost. As I already had an early prototype for
such an API, I kept evolving it, based on feedback from those
discussions.
A couple of weeks ago I actually checked it into the sandbox
(http://svn.boost.org/trac/boost/browser/sandbox/xml).
...
PS: The current scope of the project is described in
http://svn.boost.org/trac/boost/browser/sandbox/xml/README
Hi Stefan,

My comments follow; these are based on maybe half an hour looking at 
your code, but it's quite possible that I have missed something.  As 
others have pointed out, it would be easier to evaluate with some more docs...

I certainly agree that C++ would benefit from an XML API and Boost is a 
good place to develop it.

As far as I can see, what you have is a wrapper around the GNOME 
libxml2 (which has an MIT-license and is cross-platform) that 
implements something that you call dom, but is not the standardised 
"DOM" API for XML (http://www.w3.org/DOM/).

I think that two C++ APIs for XML document manipulation could be justified:

(a) DOM.  This has the benefit of being standardised, so you can 
transfer at least your experience and to some extent actual code from 
one language to another (e.g. C++ to/from Javascript in my case).  On 
the other hand it is a rather verbose and unenjoyable API that isn't a 
great match to 'modern' C++.

(b) A standard-library-like API (e.g. attributes are a map, child nodes 
are a sequence).  This would have the benefit of familiarity to users 
of the C++ standard library, and I think it would be a more concise and 
usable API.

As far as I can see, what you have created is something that isn't (a) 
or (b) but falls somewhere between.  For example, you provide iterators 
rather than the nextSibling-style functions of DOM, but you provide 
custom functions like append_element and set_attribute rather than 
standard-library-like append() and operator[] implementations.  For 
example, compare:

- DOM:
e.setAttribute("color","red");
e.appendChild(doc.createElement("P"));

- Yours:
e.set_attribute("color","red");
e.append_element("P");

- STL-like:
e.attributes["color"]="red";
e.children.push_back(new Element("P"));

In the past I have used a library called xmlwrapp.  You should take a 
look at it if you have not done so already.  It has a very liberal 
license (boost-like).  It is also a C++ libxml2 wrapper and as I recall 
its style is similar to yours.  It seemed to do nearly everything that 
I wanted.  I remember being confused about the ownership semantics of 
pointed-to objects sometimes; what is your policy?  (e.g. if I copy a 
subtree to another place in the document, is it a deep copy or a 
pointer copy?  Copy-on-write?  When is it freed?  Reference counted?)  
I was also surprised once with the memory inefficiency: you might like 
to consider how many MB of RAM are needed to store in-memory a document 
that is X MB on disk, for examples with many small nodes or fewer 
larger nodes.  In my case, it would have helped to use some sort of 
dictionary for element and attribute names.

One thing that xmlwrapp did not offer was a way to access the 
underlying libxml2 C 'object'.  While this is normally an 
implementation detail that you would like to hide, note that there are 
other C libraries that you might want to use; I think the one that I 
was looking at was the SVG renderer librsvg [attn Jake!].  I wanted to 
build an in-memory XML/SVG document in my C++ code and then convert it 
to a bitmap, but because xmlwrapp wouldn't let me get at the raw 
libxml2 stuff, I couldn't, and had to go via a temporary file.  (Or 
maybe I hacked it, can't remember.)  Doing XSLT transformations would 
be another example where this would be necessary.

I hope these comments are useful; what do others think?

Regards,

Phil.

Re: [boost] RFC: Boost.XML API prototype in the sandbox

Phil Endecott