Re: [boost] Proposal: XML APIs in boost

4 Nov 2005

      Stefan Seefeld <seefeld@sympatico.ca> writes:
...
Jez Higgins wrote:
...
...
A better API that still follows the cursor-style approach from SAX,
is the XMLReader. It uses a pull model instead of push, i.e. there
are no callbacks, but instead the application advances the reader's
internal cursor to the next 'token'.
See http://xmlsoft.org/xmlreader.html for a comparison to SAX.
For some definition of better.  The unpleasantness with pull APIs is the 
token - you have to interrogate it for its actual type, and then 
dispatch.
Granted. But the underlaying parser which any SAX implementation would
build on would have to do that, too. You can think of the reader as
that lower layer, and thus a push API with type-safe dispatching
can easily be built on top, if that is what you want.
Of course, the other direction is possible, too. However, logistically
it is easier to put the push layer over the pull layer, i.e. the SAX
implementation on top of the reader:
Surely it depends on which parser you use. My XML-parser-in-progress
(sourceforge.net/projects/axemill) uses a callback mechanism akin to SAX; at
the moment, that's all there is, as I haven't written a DOM yet.

It is far easier to write a parser that calls user code (push model) than
write a parser that can be continued (pull model), since in the pull model you
have to save all the internal state in order to return to the user with each
token; you basically have to write a "continuations" mechanism.
...
As it happens, the implementation I have in mind uses libxml2, a C
library. As such between the application calling 'parse()' and the
callbacks are two language boundaries (C++ -> C and C -> C++), so
you couldn't even throw exceptions from inside the callbacks and
catch them in the main application.
That's one of my main criticisms of your suggested API --- it's too tightly
bound to libxml, and doesn't really allow for substitution of another parser.

My other criticism so far is the node::type() function. I really don't believe
in such type tags; we should be using virtual function dispatch instead, using
the Visitor pattern. Your traversal example could then ditch the
traverse(node_ptr) overload, and instead be called with
document->root.visit(traversal)
...
If, on the other hand, the callback dispatcher itself was written in
C++, no language boundaries would need to be crossed while unwinding
the callback stack.
Yes. Axemill would allow that, for example.

Anthony
-- 
Anthony Williams
Software Developer
Just Software Solutions Ltd
http://www.justsoftwaresolutions.co.uk