Re: [boost] Strings tagged with their character set

24 Sep 2007

      Instead of defining character types per character set, you could use a 
specialized char_traits class. It contains state_type, which is used 
with codecvt from the I/O stream library. The default typedef for char 
and wchar_t is mbstate_t, which appears in the standard specializations 
for codecvt. (codecvt is used to perform code conversion between 
character types; it's used in wfstream to convert a stream of chars on 
disk to wchar_ts in memory.)

If you change state_type in the char_traits, you'd be able to 
differentiate the various basic_string types and include information 
about the character encoding without writing a whole lot of new code.

To be honest, I'm only just beginning to look into this myself, so I'm 
afraid I don't have a whole lot of information to give you, but I do 
think this would be the simplest way to handle this part of your project.

- James

Phil Endecott wrote:
[snip]
...
If latin1string has a constructor from std::string (which is its own 
base type) that's fine, i.e. we can still write:
latin1string s2 = s1.substr(1,5);
but unfortunately we can also write
latin2string s3 = s1.substr(1,5);
which is not so good.
So a different approach is to define a set of character-set-specific 
character types, and build string types from them:
typedef char8_t latin1char;
typedef char8_t latin2char;
[/snip]

Re: [boost] Strings tagged with their character set

James Porter