Re: [Boost-users] [Serialization] a coupla crashes from hiearchy complexity
Robert, The 1/2 megabyte string being converted is pure XML generated by the xml_oarchive and read in using xml_iarchive in the boost serialization library. It is being inserted as a single tag of an XML item serialized entirely by the xml_oarchive and xml_iarchive in the boost serialization library. I believe that there are only 5 characters that need to be converted: less than (<), greater than (>), quote ("), apostrophe ('), and ampersand (&). Changing a block of text containing these characters to their corresponding escape sequences doesn't seem very difficult to me. It is being done completely automatically by the serialization library. Our current solution, which uses a text archive on the front end and stores the internal strings as xml seems to work for now and completely avoids this conversion. I just wanted to let you know that the conversion routine in the boost serialization library doesn't work for large buffers. I had to laugh at your response at sending this library to you. That certainly wouldn't have been MY first choice, but I did want to make the offer. Ed -----Original Message----- From: Robert Ramey [mailto:ramey@rrsd.com] Sent: Wednesday, November 15, 2006 8:33 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Serialization] acouplacrashesfromhiearchycomplexity "Reusser, Edward" <Edward.Reusser@actel.com> wrote in message news:5E916BAE1732F344BAFD713DF2373479016EAAA9@SV-MSG-01.amer.actel.com... Robert, But the same exact problem occurs. Even using boost serialization, a text value within 2 tags cannot exceed about ½ megabyte in length if it contains escaped character sequences, because when it is read back it will not be properly converted to XML. I know the entire XML string is being translated out to the file because I can open the file and see that it is correct. Furthermore, the serialization library doesn't report any errors in reading the key/value pairs. It isn't until I parse the value using the xml_iarchive that it fails. **** I suspect that both serialization and xerces are having the same problem. The problem is that standards for xml/html escape sequences are ambiguous. and not correctly escaped. This occurs when the string includes non-string charactes like null, and who knows what else. Its probably not a hard fix - but would take alot of time to sort the all the varying standardese. If anyone want's todo this let me know. If your doing 1/2 mb of string - its possible it contains some wierd characters that neither xerces nor I know what to do with. So a couple of things to try would be: a) try xml_wachive - this stores data in UTF-8 and might be more robust. b) try stroign the string as a "binary object". This would consume more spaces as it stores the data in binary coded text - but should be bullit proof. c) If you really, really need this, and you want to do your bit for humanity, you can checkout the xml escape/unescape that the serialization library uses. This is a little out of whack as it uses one method - dataflow iterators to escape, and another - spirit parser to unescape. Maybe that should be changed or at lease looked at. When in doubt - I prefer symmetry as should be apparent from the naming conventions used by the serialization library. As for the 2nd crash, using text_iarchive and text_oarchive within Library B, this still gives the internal structure overflow error. When I get time to go back and look at it again, I will probably rewrite it so that all of the shared_ptr's are stored directly and make other simplifications. If you have a mechanism for me to send this DLL directly to you, then I will attempt to get permission to do so. *** Errrr - thanks but no thanks. I woud prefer to run this via through you. Robert Ramey The information contained in or attached to this email may be subject to the Export Administration Regulations (EAR), administered by the U.S. Department of Commerce, or the International Traffic in Arms Regulations (ITAR), administered by the U.S. Department of State, and may require an export license from the Commerce or State Department prior to its export. An export can include a release or disclosure to a foreign national inside or outside the United States. Include this notice with any reproduced portion of this information.
Reusser, Edward wrote:
Robert,
The 1/2 megabyte string being converted is pure XML generated by the xml_oarchive and read in using xml_iarchive in the boost serialization library. It is being inserted as a single tag of an XML item serialized entirely by the xml_oarchive and xml_iarchive in the boost serialization library. I believe that there are only 5 characters that need to be converted: less than (<), greater than (>), quote ("), apostrophe ('), and ampersand (&).
Changing a block of text containing these characters to their corresponding escape sequences doesn't seem very difficult to me. It is being done completely automatically by the serialization library.
This is being done. What's somewhat in the air are characters like ascii 127 (rubout) ascii 07 (bell) ascii 0 (nul) etc. Sometimes people have had problems with strings which include non-text characters such as these which can create problems with text based representations. I mentioned this as something that could explain the problem that's all.
Our current solution, which uses a text archive on the front end and stores the internal strings as xml seems to work for now and completely avoids this conversion. I just wanted to let you know that the conversion routine in the boost serialization library doesn't work for large buffers.
The fact that xerces has problems with it also makes me wonder if its something common between the two. Note that text archive use as data independent system for strings. It records the size of the string then the raw characters with no escapes. This might also suggest that there is an issue with html escapes. The fact that the problem occurs at 1/2 MB string could also be significant and might even point to something in the local stream implementation. Since XML is much more verbose, its possible that moving to the text archive just hides the problem rather than really fixes it.
I had to laugh at your response at sending this library to you.
Thatis my function here. I do believe you're on to something here - but I don't have enough information to offer much help. Robert Ramey
participants (2)
-
Reusser, Edward
-
Robert Ramey