
As some may know, I am working on a Unicode library that I plan to submit to Boost fairly soon.
Take a look on Boost.Locale proposal.
The codecs in that library are based around iterators and ranges, but since there was some demand for support for codecvt facets I am working on adapting those into that form as well.
Unfortunately, it seems it is only possible to subclass std::codecvt<char, char, mbstate_t> and std::codecvt<wchar_t, char, mbstate_t>.
Yes, these are actually the only specialized classes. More then that std::codecvt<char, char, mbstate_t> should be - "noconvert" facet.
I personally don't know and understand that much about iostreams/locales, but I have looked quickly at libstdc++'s implementation and it doesn't seem like it is possible for std::locale to contain any other instance of codecvt.
You can derive from these two classes in re-implement them (like I did in Boost.Locale). Also I strongly recommend to take a look on locale and iostreams in standard library if you are working with Unicode for C++.
What I wonder is if there is really a point to facets, then. std::codecvt<wchar_t, char, mbstate_t> means that the in-memory charset would be UTF-16 or UTF-32 (depending on the size of wchar_t) while the file would be UTF-8.
Not exactly narrow encoding may be any 8-bit encoding, even something like Latin1 or Shift-JIS (and UTF-8 as well).
The problem is that wchar_t is platform-dependent and not really reliable, so it's not really something I'd recommend to use as the in-memory representation to deal with Unicode.
Welcome to broken Unicode world of C++. Yes. wchar_t is platform dependent, if you want to use it you should support both of these encodings UTF-16 and UTF-32 (technically it may be even 8 bits wide, but there is no such implementations). C++0x provides char16_t and char32_t to fix this standard's bug.
Why do people even use utf8_codecvt_facet anyway? What's wrong with dealing with UTF-8 rather than maybe UTF-16 or UTF-32?
Ask Windows developers, they use wide strings because it is the only way to work correctly with their OS. Artyom