[boost] Re: Any interest in adding unicode support to boost?

20 Oct 2004


      Erik Wien wrote:
...
The basic idea I have been working around, is to make a nencoded_string
class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is
made possible through a encoding_traits class which contains all nececcary
implementation details for working on strings of code units.
The outline of the encoding traits class looks something like this:
template<typename encoding>
struct encoding_traits
    {
    // Type definitions for code_units etc.
    // Is the encoding fixed width? (allows a good deal of iterator
optimizations)
    // Algoritms for iterating forwards and backwards over code units.
    // Function for converting a series of code units to a unicode code
point.
    // Any other operations that are encoding specific.
    }
Why do you need the traits, at compile-time?

- Why would the user want to change the encoding? Especially between
  UTF-16 and UTF-32? 

- Why would the user want to specify encoding at compile time? Are there
  performance benefits to that? Basically, if we agree that UTF-32 is not
  needed, then UTF-16 is the only encoding which does not require complex 
  handling. Maybe, for other encodings using virtual functions in character
  iterator is OK? And if iterators have abstract characters" as value_type,
  maybe the overhead if that is much large that virtual function call even
  for UTF-16.
  (As a side note, discussion about templated vs. non-templated interface
   seems a reasonable addition to a thethis. It's sure thing that if anybody
   wrote such a thethis in our lab, he would be asked to justify such a
   global decisions).

- What if the user wants to specify encoding at run time? For example, XML
  files specify encoding explicitly. I'd want to use ascii/UTF-8 encoding if
  XML document is 8-bit, and UTF-16 when it's Unicode. 


- Volodya

[boost] Re: Any interest in adding unicode support to boost?

Vladimir Prus