
Anthony Williams wrote:
It is far easier to write a parser that calls user code (push model) than write a parser that can be continued (pull model), since in the pull model you have to save all the internal state in order to return to the user with each token; you basically have to write a "continuations" mechanism.
Fair enough. But here we are (or should be) focussed on the API, i.e. the user. The question is whether to put the parser in control of the data flow or the application. While the latter is harder to implement it is also far more convenient for users.
As it happens, the implementation I have in mind uses libxml2, a C library. As such between the application calling 'parse()' and the callbacks are two language boundaries (C++ -> C and C -> C++), so you couldn't even throw exceptions from inside the callbacks and catch them in the main application.
That's one of my main criticisms of your suggested API --- it's too tightly bound to libxml, and doesn't really allow for substitution of another parser.
Could you substantiate your claim ?
My other criticism so far is the node::type() function. I really don't believe in such type tags; we should be using virtual function dispatch instead, using the Visitor pattern. Your traversal example could then ditch the traverse(node_ptr) overload, and instead be called with document->root.visit(traversal)
Node types aren't (runtime-) polymorphic right now, but is that really a big deal ? Polymorphism is important for extensibility. However here the set of node types is well known (and rather limited). Making nodes polymorphic would imply that the library allocates nodes on the heap, instead of the stack (as it now does). That could well hurt performance. I'm not sure how much of an issue that is, though. Regards, Stefan