Re: [boost] [unicode] Interest Check / Proof of Concept

19 Nov 2008

      ...
There's still a lot missing from the code (most notably, dynamically-sized
strings and string concatenation), but here's a rundown of what *is*
present:
* Compile-time and run-time tagged strings
* Re-encoding of strings based on compile-/run-time tags
* Uses simple memory copying when source and dest encodings are the same
* Forward iterators to step through code points in strings
If you'd like to take a look at the code, it's available here:
http://www.teamboxel.com/misc/unicode.tar.gz . I've tested it in gcc 4.3.2
and MSVC8, but most modern compilers should be able to handle it. Comments
and criticisms are, of course, welcome.
I think it looks like a good start. I'm getting  a warning about a
string->wchar_t conversion.

Just a couple comments/questions...
- I don't think the global rt encoding objects are the way to go. I would
just each each string object declare the encoding object either as a member
variable or as needed inside a member function. Since they don't have any
member variables, the cost is negligible.
- Would it be possible to merge the ct/rt classes into a single type?
- Maybe encode/decode should be free functions - algorithm like.

You might have something like:

estring<> s= ...; // Create an encodeable string with some default encoding
(ascii?)
encode(s, utf8()); // utf8 is a functor object that returns a utf8_encoder
object.

I guess if you go this way, the estring class would just contain an encoded
string associated with the encoder type. It might be an interesting
approach. Still. A good start.

Andrew Sutton
andrew.n.sutton@gmail.com