Re: [boost] [serialization] [libstdc++] [detail] utf8_codecvt_facet fixes broke serialization test_array_xml_warchive

5 Sep 2014


      On Thu, Sep 4, 2014 at 11:59 AM, Robert Ramey <ramey@rrsd.com> wrote:
...
Beman Dawes wrote
...
The specific crash message is:
*** Error in
`../../../bin.v2/libs/serialization/test/test_array_xml_warchive.test/clang-linux-libstdcpp/debug/test_array_xml_warchive':
...
double free or corruption (!prev): 0x00000000015f6f90 ***
It occurs for clang, gcc, and intel compilers, using libstdc++. It does
not
occur with clang using libc++. It does not occur with msvc 10.0, 11.0, or
12.0.
Although the regression tests are only showing failures on non-Windows
systems, the failure is also easy to reproduce using cygwin/gcc on Windows.
It occurs in both C++03 and C++11 modes.
...
...
None of the other libraries (filesystem, log, program_options,
property_tree) that use utf8_codecvt_facet are failing on develop.
This is mainly a heads up to let people know that the serialization
problem
in develop is being worked on, but it may be a day or two before I have a
fix.
Hmmmm - this looks like new behavior.
Actually, this is the same problem Marshall ran into a year or so ago when
he fixed boost/detail/utf8_codecvt_facet.hpp:

---
C:\Users\Beman\AppData\Local\Temp\TortoiseGit\utf253F.tmp\utf8_codecvt_facet-5ef03bf-left.hpp
2014-09-05 10:23:25.000000000 -0400
+++
C:\boost\modular\develop\libs\detail\include\boost\detail\utf8_codecvt_facet.hpp
2014-09-05 08:43:11.000000000 -0400
@@ -89,13 +89,13 @@
 namespace std {
     using ::mbstate_t;
     using ::size_t;
 }
 #endif

-#if !defined(__MSL_CPP__) && !defined(__LIBCOMO__)
+#if defined(_CPPLIB_VER) && (_CPPLIB_VER < 540)
     #define BOOST_CODECVT_DO_LENGTH_CONST const
 #else
     #define BOOST_CODECVT_DO_LENGTH_CONST
 #endif

 // maximum lenght of a multibyte string
...
I don't remember changing anything
that might provoke this.  Am I wrong or is there some other change (perhaps
in another library) which provokes this? Since C++11 we had
some problems with utf8_codecvt_facet due to confusion between the
now "built-in" implementation and the original "home grown" version.
AFAIK, serialization is the only library that tries to switch between the
std:: version and the boost:: version. It is quite clear the bug is in
serialization (or even stdlibc++ codecvt) rather than in the boost::detail
code.
...
It
took some time to sort out because it varied according to which
combinations
of compiler version and compiler switches were selected and no one has
all combinations on their desktop.
The bug is showing up regardless of the compiler version or switches. It is
easy to demonstrate; just switch back and forth between the two versions of
the #if line.
...
So fair warning about being too hasty
about fixing this or declaring it fixed.  I got trapped several times this
way.
The #if bug and several other bugs in boost::detail that got introduced
trying to make serialization work around the time Marshall introduced his
original patch. While those changes papered over the problem in
serialization, they are causing bug reports to be posted against other
libraries, particularly filesystem.
...
Also note that it seems that is only used on wide character strings and
lots
of other libraries don't require these.  So it might be wrong and our tests
might not be sufficiently exhaustive to detect this.
In filesystem, all BSD-based operating systems (such as Mac OS X) use the
boost::detail code.
...
This raises another interesting question.  For many years we've been
relying
on Ron Garcia's original codecvt facet which has worked fine.  This in
spite
of the
fact that it was never reviewed and attempts to include in boost outside of
the detail directories were rebuffed.  I snuck the documentation and tests
of it into the serialization library as I needed it and had no other
choice.
But now it's sort of intertwined with the std implementation (IRC) which is
part of the problem.
Does any Boost library other than serialization try to switch between
boost:: and std:: versions?
...
A better solution might be a new library for codecvt facets.
There
is a rich opportunity here.
Why? Microsoft, for example, ships codecvt facets for 79 character sets,
including the difficult Asian character sets. Why should boost try to
duplicate the work that vendors have already done, particularly when
Unicode become predominate.?

--Beman