
Sebastian Redl wrote:
Once again I'm turning to the list for discussion about a design issue in the XML library. This time I hope to avoid any discussion about the implementation on the library and focus on interface only.
Have you thought about asynchronous parsing? How could that be available?
The interface in question is the reader interface, also known as pull interface. Like SAX, the pull interface is an event-based interface. There are a few event types (roughly, StartElement, EndElement, Characters, and a few more for other XML features), all of which provide come with some additional data: the element name, the character data, etc.
There are two types of reader interfaces currently in use that I've found. I've come up with a third. I wonder which the people on this list would prefer, where they see their weaknesses and strengths. The names that I've given them are my own creation.
There are of course variations, like the one Matt Gruenke revealed. You could provide the inheritance interface but with the objects actually owned by the parser (making it kind of like the monolithic interface), and use variant to store those objects on the stack. This idea doesn't look so bad actually, since you have the second solution without its drawbacks and that you only gain the advantages of the first solution (if you provide the appropriate tools to allow copy construction of the referenced objects, that is). I don't understand, though, if you mean that the parser containing its state is a good thing or not. Anyway, whatever is chosen, I think using variant with the ability to get a base will be a good idea somewhere. This provides both type-safe `which' and visitors and RTTI for those we want it. Examples of how some basic operations could be done with those interfaces would come in handy to compare them for the ones, like me, that don't have much experience with parsing XML.
Independently of the type of interface chosen, another issue is important: the scope of the interface. Should it report all XML events, including those coming from DTD parsing?
Validation is quite costly: a way to prevent it would be nice. And it's not just DTD, there are other validation means. However, without validation you don't know what the `id' attribute is, which is quite annoying. It seems that's why they introduced xml:id. Browser engines like Gecko don't validate but they know what the id attributes are for each namespace that they handle. Maybe something similar could be done, be it with static data or user input.
Should this be a user choice,
Don't validate by default, and do it if the user asks for it. It seems like the better choice to me.
Should errors be reported as error events, or as exceptions?
We expect errors to happen, so we shouldn't use exceptions. We could allow them to be toggled on though, for users that don't want to check for such things and are not looking for super efficiency. Maybe they should be using a higher level API then though.
How about warnings: exceptions are inappropriate for them. Should it be possible to disable them completely?
In exception mode, it should be allowed to ignore warnings, and maybe be the default.