
Andrew Sutton wrote:
I think it looks like a good start. I'm getting a warning about a string->wchar_t conversion.
I think gcc is complaining because it defines wchar_t as 32 bits. Honestly, wchar_t is pretty awful since its size is platform-dependent, but I don't think any compiler supports the new Unicode strings yet. :) I suppose I could have said "int16_t raw[] = { 'T', 'e', 's', 't', ... };", but that's not very readable!
Just a couple comments/questions... - I don't think the global rt encoding objects are the way to go. I would just each each string object declare the encoding object either as a member variable or as needed inside a member function. Since they don't have any member variables, the cost is negligible.
This is probably workable. Do you envision something like the following? my_string.encode(source,utf8()); It would have the benefit of making the interface for ct_strings and rt_strings the same. For ct_strings, it would specialize on the type of the encoding parameter, and for rt_strings, it would wrap the encoding up in some object to give it virtual dispatch.
- Would it be possible to merge the ct/rt classes into a single type?
This would definitely be possible. Assuming I can make the interface identical, I could just make a special "encoding type" for ct_strings to make them behave like rt_strings do now.
- Maybe encode/decode should be free functions - algorithm like.
You might have something like:
estring<> s= ...; // Create an encodeable string with some default encoding (ascii?) encode(s, utf8()); // utf8 is a functor object that returns a utf8_encoder object.
I guess if you go this way, the estring class would just contain an encoded string associated with the encoder type. It might be an interesting approach. Still. A good start.
Do you envision the encode algorithm re-encoding the contents of s into a new encoding, or just tagging s with a "utf8" encoding? Perhaps a better verb for "encode" would have been "transcode", since it's responsible for decoding from a source and encoding to a target. "encode" sounds better though. :) - Jim