[serialization] XML archive is order-dependent

I'm using boost::archive::xml_iarchive to read data from an XML archive created by hand. It works as long as I keep all the tags in the same order that they are serialized in my code. As soon as I put in an extra XML tag somewhere, de-serialization fails. Since XML is order-independent, is it possible to make boost::serialization also order-independent? Stan

Stan Vasilyev wrote:
I'm using boost::archive::xml_iarchive to read data from an XML archive created by hand. It works as long as I keep all the tags in the same order that they are serialized in my code. As soon as I put in an extra XML tag somewhere, de-serialization fails.
Since XML is order-independent, is it possible to make boost::serialization also order-independent?
I think there is a google summer-of-code project to make a simple XML archive as it seems many people are using the XML archive for config files etc, so I would hope that order-independence would be part of it. I like the current archive as it is, but would like the simpler one added too. We serialize our data files which can be up to 100Mb in size when written as XML. Parsing those as order independent would involve a huge penalty I would guess (and they are already drastically slower than binary archives). Cheers Russell

This would be possible but ... a) It would require some effort and no one has found the benefit to be worth that effort. b) It would usually require that the whole archive be loaded into memory to be parsed. This would mean that one could no longer handle an arbitrary sized data set. For some applications this wouldn't be a problem. For others - a deal breaker. So its a question of trading one limitation for another - arbitray file size vs arbitrary member order. Robert Ramey Stan Vasilyev wrote:
I'm using boost::archive::xml_iarchive to read data from an XML archive created by hand. It works as long as I keep all the tags in the same order that they are serialized in my code. As soon as I put in an extra XML tag somewhere, de-serialization fails.
Since XML is order-independent, is it possible to make boost::serialization also order-independent?
Stan

On 6/19/06, Robert Ramey
This would be possible but ...
a) It would require some effort and no one has found the benefit to be worth that effort. b) It would usually require that the whole archive be loaded into memory to be parsed. This would mean that one could no longer handle an arbitrary sized data set. For some applications this wouldn't be a problem. For others - a deal breaker.
That makes sense. Thanks

Le Sun, 18 Jun 2006 17:24:48 -0700, Stan Vasilyev a écrit :
Since XML is order-independent, is it possible to make boost::serialization also order-independent?
I'm pretty sure XML is order-dependent. Some languages whose syntax is based on XML could have a semantic that is independant of the order of the tags in the document, like RDF/XML, but different ordering of the tags produce different XML documents and different XML infosets... Syntaxically, Nowhere man -- nowhere.man@levallois.eu.org OpenPGP 0xD9D50D8A

Pierre THIERRY wrote:
I'm pretty sure XML is order-dependent. Some languages whose syntax is based on XML could have a semantic that is independant of the order of the tags in the document, like RDF/XML, but different ordering of the tags produce different XML documents and different XML infosets...
I'm sorry to disagree but each XML element can be order-dependent or not according to the XML schema used. In the XML schemas you can specify the sub-elements of a given element must appear on a given order or you can specify sub-elements can appear in any order and in any amount. Furthermore, you can specify if an element may contain new sub-elements (open content model) or only those sub-elements specified in the schema (closed content model). However, boost::serialization doesn't care about XML schemas and the current serialization xml archive seems to use a SAX-style parsing to process the XML document so it must be order-dependent. If the XML were parsed via a DOM then elements and attributes could be requested in any order but as Robert Ramey said, you would need to load the whole XML in memory which is overkill for many applications. Hopefully somebody will have the time to build a DOM-parsing XML archive and then developers can select SAX or DOM parsing depending on the application. Just my 2 cents, -delfin

Le Tue, 20 Jun 2006 10:02:37 -0700, Delfin Rojas a écrit :
I'm sorry to disagree but each XML element can be order-dependent or not according to the XML schema used. In the XML schemas you can specify the sub-elements of a given element must appear on a given order or you can specify sub-elements can appear in any order and in any amount.
But that doesn't make the different orderings an equivalence class for documents or infosets. That is, a schema could allow: <root> <foo value="1"/> <bar value="2"/> </root> and <root> <bar value="2"/> <foo value="1"/> </root> and the application reading them could consider that they provide the same information, but they still would constitute two different documents and infosets...
However, boost::serialization doesn't care about XML schemas and the current serialization xml archive seems to use a SAX-style parsing to process the XML document so it must be order-dependent.
Though SAX-parsing could enable order-independence: you could fill the foo member wherever you're parsing it and the bar member the same way, AFAIK. But I didn't look at the way the serialization works, so maybe it's only possible with DOM. Doubtfully, Nowhere man -- nowhere.man@levallois.eu.org OpenPGP 0xD9D50D8A
participants (5)
-
Delfin Rojas
-
Pierre THIERRY
-
Robert Ramey
-
Russell Hind
-
Stan Vasilyev