
There's still a lot missing from the code (most notably, dynamically-sized strings and string concatenation), but here's a rundown of what *is* present:
* Compile-time and run-time tagged strings * Re-encoding of strings based on compile-/run-time tags * Uses simple memory copying when source and dest encodings are the same * Forward iterators to step through code points in strings
If you'd like to take a look at the code, it's available here: http://www.teamboxel.com/misc/unicode.tar.gz . I've tested it in gcc 4.3.2 and MSVC8, but most modern compilers should be able to handle it. Comments and criticisms are, of course, welcome.
I think it looks like a good start. I'm getting a warning about a string->wchar_t conversion. Just a couple comments/questions... - I don't think the global rt encoding objects are the way to go. I would just each each string object declare the encoding object either as a member variable or as needed inside a member function. Since they don't have any member variables, the cost is negligible. - Would it be possible to merge the ct/rt classes into a single type? - Maybe encode/decode should be free functions - algorithm like. You might have something like: estring<> s= ...; // Create an encodeable string with some default encoding (ascii?) encode(s, utf8()); // utf8 is a functor object that returns a utf8_encoder object. I guess if you go this way, the estring class would just contain an encoded string associated with the encoder type. It might be an interesting approach. Still. A good start. Andrew Sutton andrew.n.sutton@gmail.com