[serialization] bug in wide character strings

We've discovered an issue Boost has writing and reading wide character strings (wchar_t* and std::wstrings) to non-wide character file streams (std::ifstream and std::ofstream). The issue stems from the fact that wide characters are written and read as a sequence of characters (in text_oarchive_impl.ipp and text_iarchive_impl.ipp, respectively). For text streams, an EOF character terminates the reading of a file on Windows. Some wide characters have EOF (value = 26 decimal) as one of the bytes so reading that byte causes early termination of the read. We have worked around the issue by deriving our own input and output archives from text_i|oarchive_impl<Archive> and overriding load_override() and save_override for std::wstring and wchar_t*. Our implementation just sequences through the wide characters and writes them 1 by 1 as wchar_t to the archive. This isn't very elegant and is even less readable in the file than the current implementation but does resolve the problem. I looked at both Boost 1.34.1 and 1.35 and didn't see a difference in the implementation here so I'm assuming 1.35 still has the issue. I've been working with 1.34.1. Is this a known issue? Does 1.35 solve it in some other subtle way? Is there a better way that doesn't require us to derive our own streams? If not, is there a more elegant way of implementing the reading and writing of wide characters ourselves? Thanks in advance. Jeff Faust

compare you application with test_simple_class. This application saves and restores every data type suppported by the compiler/library/platform. This should include wstring. Robert Ramey Jeffrey Faust wrote:
We've discovered an issue Boost has writing and reading wide character strings (wchar_t* and std::wstrings) to non-wide character file streams (std::ifstream and std::ofstream). The issue stems from the fact that wide characters are written and read as a sequence of characters (in text_oarchive_impl.ipp and text_iarchive_impl.ipp, respectively). For text streams, an EOF character terminates the reading of a file on Windows. Some wide characters have EOF (value = 26 decimal) as one of the bytes so reading that byte causes early termination of the read. We have worked around the issue by deriving our own input and output archives from text_i|oarchive_impl<Archive> and overriding load_override() and save_override for std::wstring and wchar_t*. Our implementation just sequences through the wide characters and writes them 1 by 1 as wchar_t to the archive. This isn't very elegant and is even less readable in the file than the current implementation but does resolve the problem.
I looked at both Boost 1.34.1 and 1.35 and didn't see a difference in the implementation here so I'm assuming 1.35 still has the issue. I've been working with 1.34.1. Is this a known issue? Does 1.35 solve it in some other subtle way? Is there a better way that doesn't require us to derive our own streams? If not, is there a more elegant way of implementing the reading and writing of wide characters ourselves?
Thanks in advance.
Jeff Faust
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

This test case only populates the wstring with lower case letters 'a'-'z'. This does not seem to be a sufficient test for serializing wide strings. Jeff Robert Ramey wrote:
compare you application with test_simple_class. This application saves and restores every data type suppported by the compiler/library/platform. This should include wstring.
Robert Ramey
Jeffrey Faust wrote:
We've discovered an issue Boost has writing and reading wide character strings (wchar_t* and std::wstrings) to non-wide character file streams (std::ifstream and std::ofstream). The issue stems from the fact that wide characters are written and read as a sequence of characters (in text_oarchive_impl.ipp and text_iarchive_impl.ipp, respectively). For text streams, an EOF character terminates the reading of a file on Windows. Some wide characters have EOF (value = 26 decimal) as one of the bytes so reading that byte causes early termination of the read. We have worked around the issue by deriving our own input and output archives from text_i|oarchive_impl<Archive> and overriding load_override() and save_override for std::wstring and wchar_t*. Our implementation just sequences through the wide characters and writes them 1 by 1 as wchar_t to the archive. This isn't very elegant and is even less readable in the file than the current implementation but does resolve the problem.
I looked at both Boost 1.34.1 and 1.35 and didn't see a difference in the implementation here so I'm assuming 1.35 still has the issue. I've been working with 1.34.1. Is this a known issue? Does 1.35 solve it in some other subtle way? Is there a better way that doesn't require us to derive our own streams? If not, is there a more elegant way of implementing the reading and writing of wide characters ourselves?
Thanks in advance.
Jeff Faust

OK- make a TRAK item out of this so I don't forget. As a working around, you could tranform it to a vector of wchar or using binary_object Robert Ramey Jeffrey Faust wrote:
This test case only populates the wstring with lower case letters 'a'-'z'. This does not seem to be a sufficient test for serializing wide strings.
Jeff
Robert Ramey wrote:
compare you application with test_simple_class. This application saves and restores every data type suppported by the compiler/library/platform. This should include wstring.
Robert Ramey
Jeffrey Faust wrote:
We've discovered an issue Boost has writing and reading wide character strings (wchar_t* and std::wstrings) to non-wide character file streams (std::ifstream and std::ofstream). The issue stems from the fact that wide characters are written and read as a sequence of characters (in text_oarchive_impl.ipp and text_iarchive_impl.ipp, respectively). For text streams, an EOF character terminates the reading of a file on Windows. Some wide characters have EOF (value = 26 decimal) as one of the bytes so reading that byte causes early termination of the read. We have worked around the issue by deriving our own input and output archives from text_i|oarchive_impl<Archive> and overriding load_override() and save_override for std::wstring and wchar_t*. Our implementation just sequences through the wide characters and writes them 1 by 1 as wchar_t to the archive. This isn't very elegant and is even less readable in the file than the current implementation but does resolve the problem.
I looked at both Boost 1.34.1 and 1.35 and didn't see a difference in the implementation here so I'm assuming 1.35 still has the issue. I've been working with 1.34.1. Is this a known issue? Does 1.35 solve it in some other subtle way? Is there a better way that doesn't require us to derive our own streams? If not, is there a more elegant way of implementing the reading and writing of wide characters ourselves?
Thanks in advance.
Jeff Faust
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Submitted as ticket 1836. Thanks, Jeff Robert Ramey wrote:
OK- make a TRAK item out of this so I don't forget.
As a working around, you could tranform it to a vector of wchar or using binary_object
Robert Ramey
Jeffrey Faust wrote:
This test case only populates the wstring with lower case letters 'a'-'z'. This does not seem to be a sufficient test for serializing wide strings.
Jeff
participants (2)
-
Jeffrey Faust
-
Robert Ramey