Re: [boost] RFC: Boost.XML API prototype in the sandbox

14 Jul 2007


      Stefan Seefeld wrote:
...
The reason to delegate to a factory is to let it do a lot of resource
management that thus can be hidden from the user. There is a lot to be
considered, as each node lives in a particular context given by the document
as well as its position in it (think of namespaces, for example).
OK, makes sense.
...
Sorry, that argument I don't accept. Yes, I deliberately chose not to
use the API as obtained from using the CORBA C++ bindings of the OMG IDL DOM.
The hope is to get something better, much more naturally tied to modern
C++ idioms. Whether or not I achieve that is to be discussed, and can be
criticized, but the lack of conformance to existing DOM APIs in itself
is hardly an argument worth debating.
But I don't mean the argument to stand by itself. As I said, you
diverged slightly from the DOM API. The problem that I see is that it
neither matches existing knowledge, nor brings sufficient advantage to
justify this loss. In other words, if you leave the beaten track, you
shouldn't walk right alongside it, among stones but without any
advantage - you should at least take a shortcut through the woods.
...
OK, the API represents the Infoset, and thus has no idea of what an entity
is. I'm not sure whether that would be worth adding. And if, it may be
some hook into the XML writer (the XML parser already has it).
Possibly, yes. Such low-level functionality might indeed better be
limited to the low-level stream representation.
On the other hand, rare as their use is, there are also unparsed
entities. The very much exist at the high-level representation. And not
all parsed entities can be expanded, especially by non-validating parsers.
...
I don't understand what you are aiming at in your comment about the
'document schema'.
I mean that there is no way in the current API to obtain information
about the document type, beyond its name and location.
...
That's an implementation detail (IMO).
No, it's not. It's a matter of public derivation of classes and thus
very much an interface issue.
Semantically, a text node and
a cdata node are distinct,
Also not true, at least as far as Infoset is concerned. See also
Appendix D of the Infoset spec, item 19.
 and so visitors shouldn't give users access
to a cdata node as a text node. (And what else would the ISA relationship
be good for ?)
But that's exactly what they should do (if the user wants to ignore the
difference). CDATA, as I said before, is a serialization issue, and
completely irrelevant to a user who just wants to know what text the
document contains.
...
I'm sure this can be refined. (In fact, I don't think DTDs will play any
significant role in the future, as other document type definitions become
more popular, such as relaxng).
True. Perhaps an API for generalized schema access can be devised.

Sebastian Redl