
Phil Endecott wrote:
Dear All,
Something that I have been thinking about for a while is storing strings tagged with their character set. Since I now have a practical need for this I plan to try to implement something. Your feedback would be appreciated.
Hi, I've played around with this concept a lot already. I basically think that encoding-bound strings are a MUST for proper, safe, internationalized string handling. Everything else, in particular the current situation, is a mess. If you want, I can package up what I've done so far (not really much, but a lot of comments containing concepts) and put it somewhere. One thing: I think runtime-tagged strings are useless. Programming should happen with one or at most two fixed encodings, known at compile time. Because of the differences in behaviour in encodings (base unit 8, 16 or 32 bits, or 8 with various endians, fixed-length encodings vs variable-length encodings, ...), it is not good to write a type handling them all at runtime. I think that runtime-specified string conversion should be an I/O question. In other words, when character data enters your program, you convert it to the encoding you use internally, when it leaves the program, you convert it to an external encoding. In-between, you use whatever your program uses, and you specify it at compile time. I'd be willing to cooperate on this project, too. I'm mostly busy with my new I/O stuff, but the tagged strings form the foundation of the text I/O part, so I need the character library sooner or later anyway. Sebastian Redl