
Stefan Seefeld <seefeld@sympatico.ca> writes:
Jez Higgins wrote:
A better API that still follows the cursor-style approach from SAX, is the XMLReader. It uses a pull model instead of push, i.e. there are no callbacks, but instead the application advances the reader's internal cursor to the next 'token'. See http://xmlsoft.org/xmlreader.html for a comparison to SAX.
For some definition of better. The unpleasantness with pull APIs is the token - you have to interrogate it for its actual type, and then dispatch.
Granted. But the underlaying parser which any SAX implementation would build on would have to do that, too. You can think of the reader as that lower layer, and thus a push API with type-safe dispatching can easily be built on top, if that is what you want.
Of course, the other direction is possible, too. However, logistically it is easier to put the push layer over the pull layer, i.e. the SAX implementation on top of the reader:
Surely it depends on which parser you use. My XML-parser-in-progress (sourceforge.net/projects/axemill) uses a callback mechanism akin to SAX; at the moment, that's all there is, as I haven't written a DOM yet. It is far easier to write a parser that calls user code (push model) than write a parser that can be continued (pull model), since in the pull model you have to save all the internal state in order to return to the user with each token; you basically have to write a "continuations" mechanism.
As it happens, the implementation I have in mind uses libxml2, a C library. As such between the application calling 'parse()' and the callbacks are two language boundaries (C++ -> C and C -> C++), so you couldn't even throw exceptions from inside the callbacks and catch them in the main application.
That's one of my main criticisms of your suggested API --- it's too tightly bound to libxml, and doesn't really allow for substitution of another parser. My other criticism so far is the node::type() function. I really don't believe in such type tags; we should be using virtual function dispatch instead, using the Visitor pattern. Your traversal example could then ditch the traverse(node_ptr) overload, and instead be called with document->root.visit(traversal)
If, on the other hand, the callback dispatcher itself was written in C++, no language boundaries would need to be crossed while unwinding the callback stack.
Yes. Axemill would allow that, for example. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk