
Erik Wien wrote:
The basic idea I have been working around, is to make a nencoded_string class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is made possible through a encoding_traits class which contains all nececcary implementation details for working on strings of code units.
The outline of the encoding traits class looks something like this:
template<typename encoding> struct encoding_traits { // Type definitions for code_units etc. // Is the encoding fixed width? (allows a good deal of iterator optimizations) // Algoritms for iterating forwards and backwards over code units. // Function for converting a series of code units to a unicode code point. // Any other operations that are encoding specific. }
Why do you need the traits, at compile-time? - Why would the user want to change the encoding? Especially between UTF-16 and UTF-32? - Why would the user want to specify encoding at compile time? Are there performance benefits to that? Basically, if we agree that UTF-32 is not needed, then UTF-16 is the only encoding which does not require complex handling. Maybe, for other encodings using virtual functions in character iterator is OK? And if iterators have abstract characters" as value_type, maybe the overhead if that is much large that virtual function call even for UTF-16. (As a side note, discussion about templated vs. non-templated interface seems a reasonable addition to a thethis. It's sure thing that if anybody wrote such a thethis in our lab, he would be asked to justify such a global decisions). - What if the user wants to specify encoding at run time? For example, XML files specify encoding explicitly. I'd want to use ascii/UTF-8 encoding if XML document is 8-bit, and UTF-16 when it's Unicode. - Volodya