[serialization] XML archive is order-dependent - Boost-users - lists.preview.boost.org

newer
Access violation in...

[serialization] XML archive is order-dependent

older
'standard' free functions for...

Stan Vasilyev

19 Jun 2006 19 Jun '06

12:24 a.m.

I'm using boost::archive::xml_iarchive to read data from an XML archive created by hand. It works as long as I keep all the tags in the same order that they are serialized in my code. As soon as I put in an extra XML tag somewhere, de-serialization fails. Since XML is order-independent, is it possible to make boost::serialization also order-independent? Stan

Reply

Sign in to reply online Use email software

Show replies by date

Russell Hind

19 Jun 19 Jun

7:38 a.m.

Stan Vasilyev wrote:

I'm using boost::archive::xml_iarchive to read data from an XML archive created by hand. It works as long as I keep all the tags in the same order that they are serialized in my code. As soon as I put in an extra XML tag somewhere, de-serialization fails.

Since XML is order-independent, is it possible to make boost::serialization also order-independent?

I think there is a google summer-of-code project to make a simple XML archive as it seems many people are using the XML archive for config files etc, so I would hope that order-independence would be part of it. I like the current archive as it is, but would like the simpler one added too. We serialize our data files which can be up to 100Mb in size when written as XML. Parsing those as order independent would involve a huge penalty I would guess (and they are already drastically slower than binary archives). Cheers Russell

Reply

Sign in to reply online Use email software

Robert Ramey

3:33 p.m.

This would be possible but ... a) It would require some effort and no one has found the benefit to be worth that effort. b) It would usually require that the whole archive be loaded into memory to be parsed. This would mean that one could no longer handle an arbitrary sized data set. For some applications this wouldn't be a problem. For others - a deal breaker. So its a question of trading one limitation for another - arbitray file size vs arbitrary member order. Robert Ramey Stan Vasilyev wrote:

I'm using boost::archive::xml_iarchive to read data from an XML archive created by hand. It works as long as I keep all the tags in the same order that they are serialized in my code. As soon as I put in an extra XML tag somewhere, de-serialization fails.

Since XML is order-independent, is it possible to make boost::serialization also order-independent?

Stan

Reply

Sign in to reply online Use email software

Stan Vasilyev

20 Jun 20 Jun

5:47 a.m.

On 6/19/06, Robert Ramey <ramey@rrsd.com> wrote:

This would be possible but ...

a) It would require some effort and no one has found the benefit to be worth that effort. b) It would usually require that the whole archive be loaded into memory to be parsed. This would mean that one could no longer handle an arbitrary sized data set. For some applications this wouldn't be a problem. For others - a deal breaker.

That makes sense. Thanks

Reply

Sign in to reply online Use email software

Pierre THIERRY

8:04 a.m.

Le Sun, 18 Jun 2006 17:24:48 -0700, Stan Vasilyev a écrit :

Since XML is order-independent, is it possible to make boost::serialization also order-independent?

I'm pretty sure XML is order-dependent. Some languages whose syntax is based on XML could have a semantic that is independant of the order of the tags in the document, like RDF/XML, but different ordering of the tags produce different XML documents and different XML infosets... Syntaxically, Nowhere man -- nowhere.man@levallois.eu.org OpenPGP 0xD9D50D8A

Reply

Sign in to reply online Use email software

Delfin Rojas

5:02 p.m.

Pierre THIERRY wrote:

I'm pretty sure XML is order-dependent. Some languages whose syntax is based on XML could have a semantic that is independant of the order of the tags in the document, like RDF/XML, but different ordering of the tags produce different XML documents and different XML infosets...

I'm sorry to disagree but each XML element can be order-dependent or not according to the XML schema used. In the XML schemas you can specify the sub-elements of a given element must appear on a given order or you can specify sub-elements can appear in any order and in any amount. Furthermore, you can specify if an element may contain new sub-elements (open content model) or only those sub-elements specified in the schema (closed content model). However, boost::serialization doesn't care about XML schemas and the current serialization xml archive seems to use a SAX-style parsing to process the XML document so it must be order-dependent. If the XML were parsed via a DOM then elements and attributes could be requested in any order but as Robert Ramey said, you would need to load the whole XML in memory which is overkill for many applications. Hopefully somebody will have the time to build a DOM-parsing XML archive and then developers can select SAX or DOM parsing depending on the application. Just my 2 cents, -delfin

Reply

Sign in to reply online Use email software

Pierre THIERRY

22 Jun 22 Jun

2:05 p.m.

Le Tue, 20 Jun 2006 10:02:37 -0700, Delfin Rojas a écrit :

I'm sorry to disagree but each XML element can be order-dependent or not according to the XML schema used. In the XML schemas you can specify the sub-elements of a given element must appear on a given order or you can specify sub-elements can appear in any order and in any amount.

But that doesn't make the different orderings an equivalence class for documents or infosets. That is, a schema could allow: <root> <foo value="1"/> <bar value="2"/> </root> and <root> <bar value="2"/> <foo value="1"/> </root> and the application reading them could consider that they provide the same information, but they still would constitute two different documents and infosets...

However, boost::serialization doesn't care about XML schemas and the current serialization xml archive seems to use a SAX-style parsing to process the XML document so it must be order-dependent.

Though SAX-parsing could enable order-independence: you could fill the foo member wherever you're parsing it and the bar member the same way, AFAIK. But I didn't look at the way the serialization works, so maybe it's only possible with DOM. Doubtfully, Nowhere man -- nowhere.man@levallois.eu.org OpenPGP 0xD9D50D8A

Reply

Sign in to reply online Use email software

6962

Age (days ago)

6965

Last active (days ago)

Download

6 comments

5 participants

tags

participants (5)

Delfin Rojas
Pierre THIERRY
Robert Ramey
Russell Hind
Stan Vasilyev