New subject: [Serialization] a coupla crashes from hiearchycomplexity

16 Nov 2006

      Robert,

The 1/2 megabyte string being converted is pure XML generated by the xml_oarchive and read in using xml_iarchive in the boost serialization library.  It is being inserted as a single tag of an XML item serialized entirely by the xml_oarchive and xml_iarchive in the boost serialization library.  I believe that there are only 5 characters that need to be converted:  less than (<), greater than (>), quote ("), apostrophe ('), and ampersand (&).

Changing a block of text containing these characters to their corresponding escape sequences doesn't seem very difficult to me.  It is being done completely automatically by the serialization library.  

Our current solution, which uses a text archive on the front end and stores the internal strings as xml seems to work for now and completely avoids this conversion.  I just wanted to let you know that the conversion routine in the boost serialization library doesn't work for large buffers.

I had to laugh at your response at sending this library to you.  That certainly wouldn't have been MY first choice, but I did want to make the offer.

Ed

-----Original Message-----
From: Robert Ramey [mailto:ramey@rrsd.com] 
Sent: Wednesday, November 15, 2006 8:33 PM
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] [Serialization] acouplacrashesfromhiearchycomplexity

"Reusser, Edward" <Edward.Reusser@actel.com> wrote in message 
news:5E916BAE1732F344BAFD713DF2373479016EAAA9@SV-MSG-01.amer.actel.com...
Robert,

But the same exact problem occurs.  Even using boost serialization, a text 
value within 2 tags cannot exceed about ½ megabyte in length if it contains 
escaped character sequences, because when it is read back it will not be 
properly converted to XML.  I know the entire XML string is being translated 
out to the file because I can open the file and see that it is correct. 
Furthermore, the serialization library doesn't report any errors in reading 
the key/value pairs.  It isn't until I parse the value using the 
xml_iarchive that it fails.

****
I suspect that both serialization and xerces are having the same problem. 
The problem is that standards for xml/html escape sequences are ambiguous. 
and not correctly escaped.  This occurs when the string includes non-string 
charactes like null, and who knows what else.  Its probably not a hard fix - 
but would take alot of time to sort the all the varying standardese.  If 
anyone want's todo this let me know.  If your doing 1/2 mb of string - its 
possible it contains some wierd characters that neither xerces nor I know 
what to do with.

So a couple of things to try would be:

a) try xml_wachive - this stores data in UTF-8 and might be more robust.
b) try stroign the string as a "binary object".  This would consume more 
spaces as it
stores the data in binary coded text - but should be bullit proof.
c) If you really, really need this, and you want to do your bit for 
humanity, you can checkout the xml escape/unescape that the serialization 
library uses.  This is a little out of whack as it uses one method - 
dataflow iterators to escape, and another - spirit parser to unescape. 
Maybe that should be changed or at lease looked at.  When in doubt - I 
prefer symmetry as should be apparent from the naming conventions used by 
the serialization library.

As for the 2nd crash, using text_iarchive and text_oarchive within Library 
B, this still gives the internal structure overflow error.  When I get time 
to go back and look at it again, I will probably rewrite it so that all of 
the shared_ptr's are stored directly and make other simplifications.

If you have a mechanism for me to send this DLL directly to you, then I will 
attempt to get permission to do so.

***
Errrr - thanks but no thanks.  I woud prefer to run this via through you.

Robert Ramey

The information contained in or attached to this email may be subject to the Export Administration Regulations (EAR), administered by the U.S. Department of Commerce, or the International Traffic in Arms Regulations (ITAR), administered by the U.S. Department of State, and may require an export license from the Commerce or State Department prior to its export.  An export can include a release or disclosure to a foreign national inside or outside the United States.  Include this notice with any reproduced portion of this information.

Re: [Boost-users] [Serialization] a coupla crashes from hiearchy complexity

Reusser, Edward

Robert Ramey

tags

participants (2)