Re: [boost] [GSoC] Boost.XML

22 Mar 2010


      On 03/22/2010 04:35 PM, Phil Endecott wrote:
...
Hi Stefan,
First let me say that I fully understand that there are many different 
applications of XML.  I get the feeling that you and I have probably 
encountered different subsets of them.  My belief is that there are 
different legitimate types of XML library to support the different 
kinds of application.
While I agree with that, that wasn't quite my point. Rather, I tried to 
point out that you couldn't only support a subset of XML, and still 
claim to provide an XML library.
...
...
How does it deal with input needing "preprocessing", such as entity 
substitution, or (X)inclusion ?
"it" here meaning rapidxml.  In one mode of operation, it replaces 
entities (i.e. <) during parsing; this obviously breaks the idea of 
not using much RAM since the mmaped file will copy-on-write pages as 
this happens.  In another mode it doesn't do this and leaves it as a 
job for the user.
If the user is exposed to it, I would argue this is not a sufficient API 
to call itself "XML bindings". The spec has some rather specific 
discussion on what ought to be done at parsing, and what the result 
would be (e.g. http://www.w3.org/TR/xml-infoset/). I strongly object to 
an "XML library" that offers something else. (To be clear: I certainly 
don't object to such libraries in themselves, but please don't confuse 
"XML" with "XML-like".
...
In my library I have an iterator that processes a text node decoding 
entities as they are encountered.  This currently only recognises the 
"default" entities i.e. lt, gt, amp, quot, apos and numerics.  It 
would be possible to extend this to decode entities declared in the 
document, if that were necessary, but it's not something I've ever 
needed to do.
Fine. Again, the XML spec clearly defines when and how entities ought to 
be handled (http://www.w3.org/TR/REC-xml/#entproc). And to the degree 
that this processing is specified, an XML library ought to honor it.
...
I believe that a lot of XML features like entity declarations and 
namespaces declared not in the root element are painful precisely 
because they are tedious to implement, detrimental to performance, and 
never used in real-world XML documents.  My guess is that you would 
disagree with that.
I don't disagree, but I think that the world doesn't need yet another 
library that supports some Not-Quite-XML.
...
Neither rapidxml nor my library supports xinclude.  In my case, I can 
imagine adding it by modifying the element iterator such that 
dereferencing an xi:include element would open the referenced document 
and return its root element.
That, too, is not confirming to the XML spec 
(http://www.w3.org/TR/xinclude/#processing)
...
...
Also, this clearly only works with immutable input.
I think rapidxml lets you modify a document; it must allocate storage 
for the new strings somewhere and update its tree to point to them.  
My library does not allow this.  I don't think I've ever needed to 
modify an XML document: I have only either read in or written out a file.
Again: that's fine, and I agree it would be great for a boost.xml 
library to optimize for that code. However, I don't think it should 
optimize for it by disallowing the infoset to be modified.
...
Default attribute values defined in a DTD are an excellent example of 
an XML misfeature not used in any XML application that I care about 
that simply result in XML processors being more complex and slower 
than they would otherwise need to be.  (Please feel free to list any 
XML applications that make use of them.)
Same argument. You may not care, but others do.
...
However, I wouldn't say that these features are fundamentally 
incompatible with my approach in this library.  It's only necessary 
that when you look up an attribute, the returned range somehow 
includes pseudo-elements corresponding to the default attributes.
I certainly expect an attribute iterator to make no distinction between 
explicitly specified attributes and default attributes. The XML spec has 
a clear definition of an InfoSet, and what of an XML file actually is 
semantically relevant and what is not. I want boost.xml to honor those 
semantics.

Thanks,
         Stefan

-- 

       ...ich hab' noch einen Koffer in Berlin...

Re: [boost] [GSoC] Boost.XML

Stefan Seefeld