utf8_codecvt facet - volunteer opportuntiy

Our implementation for a codecvt facet internal wide char to/from utf8 format requires some maintainence. We have our own implementation since 2001. Since then, the standard libary has come to specify some implementations and some libraries have implemented some or all of this aspect of the standard library. I've come to believe that we should: a) tweak our code to use the standard library implementations if they exist. b) tweak our code to use the names of the standard library implementation b) tweak our build so that our version replaces the the standard library implementation when and only when there is no standard implemenation available. This seemingly simple task is complicated by the following circumstances. a) The relationshp between UCS and Unicode is pretty confusing. b) The iterface for codecvt is confusing. c) a number of standard library implementations have non-conforming codecvt interface. d) a number of standard library implementations don't have any codecvt header at all. This is not a problem for our current implementation because we never import it. Anyone interested in working on this should contact Marshall, Beman or myself (in that precedance). Robert Ramey

On Mon, Dec 24, 2012 at 5:53 PM, Robert Ramey <ramey@rrsd.com> wrote:
Our implementation for a codecvt facet internal wide char to/from utf8 format requires some maintainence.
We have our own implementation since 2001. Since then, the standard libary has come to specify some implementations and some libraries have implemented some or all of this aspect of the standard library. I've come to believe that we should:
a) tweak our code to use the standard library implementations if they exist. b) tweak our code to use the names of the standard library implementation b) tweak our build so that our version replaces the the standard library implementation when and only when there is no standard implemenation available.
This seemingly simple task is complicated by the following circumstances.
a) The relationshp between UCS and Unicode is pretty confusing. b) The iterface for codecvt is confusing. c) a number of standard library implementations have non-conforming codecvt interface. d) a number of standard library implementations don't have any codecvt header at all. This is not a problem for our current implementation because we never import it.
e) VC++ 2012 actually has the header <codecvt> functions, but puts them in separate headers that live in a "cvt" sub-directory, along with a great pile of codecvt facets, and some conversion functions such as wstring_convert that the standard puts in <locale>. So you have to write #include <cvt/wstring_convert> instead of #include <locale> to get wstring_convert. <aside>That's probably yet another example of how legally and/or politically difficult it can be to modify the contents of an existing header. </aside> So, yes, someone does need to address these issues. --Beman
participants (2)
-
Beman Dawes
-
Robert Ramey