
Sebastian Redl wrote:
There are two types of reader interfaces currently in use that I've found. I've come up with a third. I wonder which the people on this list would prefer, where they see their weaknesses and strengths. The names that I've given them are my own creation.
1) The Monolithic Interface Examples: .Net XMLReader, libxml2 XMLReader (modeled after the .Net one), Java Common API for XML Pull Parsing (XmlPull) (don't confuse with JSR 173 "StAX")
In the monolithic interface, the XML parser acts as a cursor over the event stream. You call next() and it points to the next event in the stream. From there, you can query its type (usually some integral constants) and call some methods to retrieve the data. All methods are always available on the object; calling one that is not appropriate for the current event (e.g. getTagName() for a Characters event) returns a null value or signals an error.
I don't like the idea of an all-embracing interface that requires the user to figure out which methods are actually valid for the current type.
2) The Inheritance Interface Examples: JSR 173 "StAX"
In the inheritance interface, the event types are modeled as a group of classes that all inherit from an Event base class. The parser acts as an iterator, Java style; calling next() returns a reference/pointer to the event object for this event. You use RTTI or a similar mechanism to find the type of the event, then cast the reference to the appropriate subclass. The subclasses then provide access to the data that is actually available for this event type.
While this sounds better (the actual interface only provides what the actual type supports), it is still the user's responsibility to figure out the type and do the cast.
3) The Variant Interface Examples: None. I believe I came up with this entirely on my own.
The variant interface seeks to combine the strengths of the other two interfaces. It uses a non-monolithic interface, that is, the parser acts like an iterator and the data is not stored within it. It does not return a reference to the event object, though, but instead a boost::variant of all possible events. This way, heap allocation of the event object is avoided, together with all the trouble coming with that. The event type can be determined either by calling variant::which, or with a variant visitor (type-safe!), or with a special get_base() function that works like get() but can retrieve a reference to a common base of all the variant types. (This is possible, although an implementation does not exist in Boost.)
Same here. You seem to assume that a single accessor is to be used to retrieve the current data, whether it is strongly / statically typed or not. What about an interface similar to SAX, where the user provides a set of handlers, one per type, and then the reader calls the appropriate one ? For example: void handle1(token1 const &); void handle2(token2 const &); ... typedef reader<handle1, handle2, ...> my_reader; my_reader r(filename); while (r.next()) r.process(); Please disregard the syntax; there are certainly multiple ways to declare and bind handlers to the reader, either at compile- or at runtime. My question is merely about whether it would be useful to use typed callbacks like this. What are the pros / cons ? Note that there is room between the two extremes, i.e. a single token type vs. independent token types: All tokens can be derived from a common base that provides access to common data, so an iterator is still possible, for example to 'fast-forward' to a particular position in the stream. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...