
On 03/22/2010 04:35 PM, Phil Endecott wrote:
Hi Stefan,
First let me say that I fully understand that there are many different applications of XML. I get the feeling that you and I have probably encountered different subsets of them. My belief is that there are different legitimate types of XML library to support the different kinds of application.
While I agree with that, that wasn't quite my point. Rather, I tried to point out that you couldn't only support a subset of XML, and still claim to provide an XML library.
How does it deal with input needing "preprocessing", such as entity substitution, or (X)inclusion ?
"it" here meaning rapidxml. In one mode of operation, it replaces entities (i.e. <) during parsing; this obviously breaks the idea of not using much RAM since the mmaped file will copy-on-write pages as this happens. In another mode it doesn't do this and leaves it as a job for the user.
If the user is exposed to it, I would argue this is not a sufficient API to call itself "XML bindings". The spec has some rather specific discussion on what ought to be done at parsing, and what the result would be (e.g. http://www.w3.org/TR/xml-infoset/). I strongly object to an "XML library" that offers something else. (To be clear: I certainly don't object to such libraries in themselves, but please don't confuse "XML" with "XML-like".
In my library I have an iterator that processes a text node decoding entities as they are encountered. This currently only recognises the "default" entities i.e. lt, gt, amp, quot, apos and numerics. It would be possible to extend this to decode entities declared in the document, if that were necessary, but it's not something I've ever needed to do.
Fine. Again, the XML spec clearly defines when and how entities ought to be handled (http://www.w3.org/TR/REC-xml/#entproc). And to the degree that this processing is specified, an XML library ought to honor it.
I believe that a lot of XML features like entity declarations and namespaces declared not in the root element are painful precisely because they are tedious to implement, detrimental to performance, and never used in real-world XML documents. My guess is that you would disagree with that.
I don't disagree, but I think that the world doesn't need yet another library that supports some Not-Quite-XML.
Neither rapidxml nor my library supports xinclude. In my case, I can imagine adding it by modifying the element iterator such that dereferencing an xi:include element would open the referenced document and return its root element.
That, too, is not confirming to the XML spec (http://www.w3.org/TR/xinclude/#processing)
Also, this clearly only works with immutable input.
I think rapidxml lets you modify a document; it must allocate storage for the new strings somewhere and update its tree to point to them. My library does not allow this. I don't think I've ever needed to modify an XML document: I have only either read in or written out a file.
Again: that's fine, and I agree it would be great for a boost.xml library to optimize for that code. However, I don't think it should optimize for it by disallowing the infoset to be modified.
Default attribute values defined in a DTD are an excellent example of an XML misfeature not used in any XML application that I care about that simply result in XML processors being more complex and slower than they would otherwise need to be. (Please feel free to list any XML applications that make use of them.)
Same argument. You may not care, but others do.
However, I wouldn't say that these features are fundamentally incompatible with my approach in this library. It's only necessary that when you look up an attribute, the returned range somehow includes pseudo-elements corresponding to the default attributes.
I certainly expect an attribute iterator to make no distinction between explicitly specified attributes and default attributes. The XML spec has a clear definition of an InfoSet, and what of an XML file actually is semantically relevant and what is not. I want boost.xml to honor those semantics. Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...