
* Stefan Seefeld <seefeld@sympatico.ca> [2005-11-04 11:39]:
Anthony Williams wrote:
It is far easier to write a parser that calls user code (push model) than write a parser that can be continued (pull model), since in the pull model you have to save all the internal state in order to return to the user with each token; you basically have to write a "continuations" mechanism.
Fair enough. But here we are (or should be) focussed on the API, i.e. the user. The question is whether to put the parser in control of the data flow or the application. While the latter is harder to implement it is also far more convenient for users.
Harder to implement could also imply a complexity that effects performance. If the user is consuming a document object model, whether that document is build via a push parser or a pull parser is moot, and the overhead of maintaining pull parser state is nothing but a penalty.
As it happens, the implementation I have in mind uses libxml2, a C library. As such between the application calling 'parse()' and the callbacks are two language boundaries (C++ -> C and C -> C++), so you couldn't even throw exceptions from inside the callbacks and catch them in the main application.
That's one of my main criticisms of your suggested API --- it's too tightly bound to libxml, and doesn't really allow for substitution of another parser.
Could you substantiate your claim ?
Sorting out exception handling, though and event framework like a push parser framework is no small challenge. I've always been critical of the Java SAXException, it is checked, and it cannot wrap a runtime expcetion, two choices that maximize the chanllenges of tunneling exceptions.
My other criticism so far is the node::type() function. I really don't believe in such type tags; we should be using virtual function dispatch instead, using the Visitor pattern. Your traversal example could then ditch the traverse(node_ptr) overload, and instead be called with document->root.visit(traversal)
Node types aren't (runtime-) polymorphic right now, but is that really a big deal ?
Polymorphism is important for extensibility. However here the set of node types is well known (and rather limited).
What about a Post-Schema Valiation Infoset PSVI? With XMLSchema the types of nodes are unlimited. -- Alan Gutierrez - alan@engrm.com - http://engrm.com/blogometer/