Re: [boost] [xml] XML Reader Interface

29 Oct 2006

      Sebastian Redl wrote:
...
Once again I'm turning to the list for discussion about a design issue
in the XML library. This time I hope to avoid any discussion about the
implementation on the library and focus on interface only.
Have you thought about asynchronous parsing?
How could that be available?
...
The interface in question is the reader interface, also known as pull
interface. Like SAX, the pull interface is an event-based interface.
There are a few event types (roughly, StartElement, EndElement,
Characters, and a few more for other XML features), all of which provide
come with some additional data: the element name, the character data, etc.
There are two types of reader interfaces currently in use that I've
found. I've come up with a third. I wonder which the people on this list
would prefer, where they see their weaknesses and strengths. The names
that I've given them are my own creation.
There are of course variations, like the one Matt Gruenke revealed.
You could provide the inheritance interface but with the objects 
actually owned by the parser (making it kind of like the monolithic 
interface), and use variant to store those objects on the stack.

This idea doesn't look so bad actually, since you have the second 
solution without its drawbacks and that you only gain the advantages of 
the first solution (if you provide the appropriate tools to allow copy 
construction of the referenced objects, that is).

I don't understand, though, if you mean that the parser containing its 
state is a good thing or not.

Anyway, whatever is chosen, I think using variant with the ability to 
get a base will be a good idea somewhere.
This provides both type-safe `which' and visitors and RTTI for those we 
want it.

Examples of how some basic operations could be done with those 
interfaces would come in handy to compare them for the ones, like me, 
that don't have much experience with parsing XML.
...
Independently of the type of interface chosen, another issue is
important: the scope of the interface. Should it report all XML events,
including those coming from DTD parsing?
Validation is quite costly: a way to prevent it would be nice. And it's 
not just DTD, there are other validation means.

However, without validation you don't know what the `id' attribute is, 
which is quite annoying. It seems that's why they introduced xml:id.
Browser engines like Gecko don't validate but they know what the id 
attributes are for each namespace that they handle. Maybe something 
similar could be done, be it with static data or user input.
...
Should this be a user choice,
Don't validate by default, and do it if the user asks for it.
It seems like the better choice to me.
...
Should errors be reported as error events, or as
exceptions?
We expect errors to happen, so we shouldn't use exceptions.
We could allow them to be toggled on though, for users that don't want 
to check for such things and are not looking for super efficiency. Maybe 
they should be using a higher level API then though.
...
How about warnings:
exceptions are inappropriate for them.
Should it be possible to disable
them completely?
In exception mode, it should be allowed to ignore warnings, and maybe be 
the default.

Re: [boost] [xml] XML Reader Interface

loufoque