Re: [boost] [xml] XML Reader Interface

31 Oct 2006

      Sebastian Redl wrote:
...
There are two types of reader interfaces currently in use that I've
found. I've come up with a third. I wonder which the people on this list
would prefer, where they see their weaknesses and strengths. The names
that I've given them are my own creation.
1) The Monolithic Interface
Examples: .Net XMLReader, libxml2 XMLReader (modeled after the .Net
one), Java Common API for XML Pull Parsing (XmlPull) (don't confuse with
JSR 173 "StAX")
In the monolithic interface, the XML parser acts as a cursor over the
event stream. You call next() and it points to the next event in the
stream. From there, you can query its type (usually some integral
constants) and call some methods to retrieve the data. All methods are
always available on the object; calling one that is not appropriate for
the current event (e.g. getTagName() for a Characters event) returns a
null value or signals an error.
I don't like the idea of an all-embracing interface that requires the
user to figure out which methods are actually valid for the current type.
...
2) The Inheritance Interface
Examples: JSR 173 "StAX"
In the inheritance interface, the event types are modeled as a group of
classes that all inherit from an Event base class. The parser acts as an
iterator, Java style; calling next() returns a reference/pointer to the
event object for this event. You use RTTI or a similar mechanism to find
the type of the event, then cast the reference to the appropriate
subclass. The subclasses then provide access to the data that is
actually available for this event type.
While this sounds better (the actual interface only provides what
the actual type supports), it is still the user's responsibility to
figure out the type and do the cast.
...
3) The Variant Interface
Examples: None. I believe I came up with this entirely on my own.
The variant interface seeks to combine the strengths of the other two
interfaces. It uses a non-monolithic interface, that is, the parser acts
like an iterator and the data is not stored within it. It does not
return a reference to the event object, though, but instead a
boost::variant of all possible events. This way, heap allocation of the
event object is avoided, together with all the trouble coming with that.
The event type can be determined either by calling variant::which, or
with a variant visitor (type-safe!), or with a special get_base()
function that works like get() but can retrieve a reference to a common
base of all the variant types. (This is possible, although an
implementation does not exist in Boost.)
Same here.
You seem to assume that a single accessor is to be used to retrieve the
current data, whether it is strongly / statically typed or not.

What about an interface similar to SAX, where the user provides a set
of handlers, one per type, and then the reader calls the appropriate
one ? For example:

void handle1(token1 const &);
void handle2(token2 const &);
...

typedef reader<handle1, handle2, ...> my_reader;
my_reader r(filename);
while (r.next()) r.process();

Please disregard the syntax; there are certainly multiple ways to
declare and bind handlers to the reader, either at compile- or at
runtime. My question is merely about whether it would be useful to
use typed callbacks like this.
What are the pros / cons ?

Note that there is room between the two extremes, i.e. a single
token type vs. independent token types: All tokens can be derived
from a common base that provides access to common data, so an
iterator is still possible, for example to 'fast-forward' to
a particular position in the stream.

Regards,
		Stefan

-- 

      ...ich hab' noch einen Koffer in Berlin...

Re: [boost] [xml] XML Reader Interface

Stefan Seefeld