
Instead of defining character types per character set, you could use a specialized char_traits class. It contains state_type, which is used with codecvt from the I/O stream library. The default typedef for char and wchar_t is mbstate_t, which appears in the standard specializations for codecvt. (codecvt is used to perform code conversion between character types; it's used in wfstream to convert a stream of chars on disk to wchar_ts in memory.) If you change state_type in the char_traits, you'd be able to differentiate the various basic_string types and include information about the character encoding without writing a whole lot of new code. To be honest, I'm only just beginning to look into this myself, so I'm afraid I don't have a whole lot of information to give you, but I do think this would be the simplest way to handle this part of your project. - James Phil Endecott wrote: [snip]
If latin1string has a constructor from std::string (which is its own base type) that's fine, i.e. we can still write:
latin1string s2 = s1.substr(1,5);
but unfortunately we can also write
latin2string s3 = s1.substr(1,5);
which is not so good.
So a different approach is to define a set of character-set-specific character types, and build string types from them:
typedef char8_t latin1char; typedef char8_t latin2char; [/snip]