
"Erik Wien" <wien@start.no> wrote in message news:d19pdf$jhu$1@sea.gmane.org... Thorsten Ottosen wrote:
Hi Erik,
Is entirely improper to make unicode strings a typedef for std::basic_string<...> ?
|Not entirely, but certainly less that optimal. basic_string (and the |iostreams) make assuptions that don't neccesarily apply to Unicode text. |One of them is that strings can be represented as a sequence of equally |sized characters. Unicode can be represented that way, but that would |mean you'd have to use 32 bits pr. character to be able to represent all |the code point assigned in the Unicode standard. In most cases, that is |way too much overhead for a string, and usually also a waste, since |unicode code points rarely require more that 16 bits to be encoded. You |could of course implement unicode for 16 bit characters in basic_string, |but that would require that the user know about things like surrogate |pairs, and also know how to correctly handle them. An unlikely scenario. I'm sure I get this, probably because I'm just don't know enough about this subject. Ok, so basic_string< char, char_trait<char>, allocator<char> > makes assumptions. So what, I was implying that you should write a specialization basic_string< char, utf_traits<char>, allocator<char> >: template< class T, class UTF > class basic_string<T,utf_traits<UTF>,std::allocator<T> > { public: basic_string() { } ... }; typedef basic_string< char, utf_traits<utf8> > utf8_string; What is it you wouldn't be able to do with this interface? |Normally I would not think so, and my first implementation did not work |this way. That one was implemented with the entire string class being |templated on encoding, and thereby eliminating the whole implementation |inheritance tree in this implementation. | |There was however (as far as I could tell at least) some concern about |this approach in the other thread. (Mostly related to code size and hm...the function is only going to be used by 3 different classes, right? If so at most 3 times the size of a virtual function solution; v-tables fill up too; and virtual functions in a class template can have *large* code size impact if not all virtual functions are used. (So are they?) |being locked into an encoding at compile time.) sometimes strong typesafety is good; sometimes it's not | Some thought that could |be a problem for XML parsers and related technology that needs to |establish encoding at run-time. (When reading files for example) ok, that seems to motivate that some form of dynamic types should be there. | This |new implementation was simply a test to see if an alternate solution |could be found, without those drawbacks. (It has a plenthora of new ones |though.) |I am more than willing to change this if the current design is no good. |Starting a discussion on this is one of my main reasons for posting the |code in the first place. It seems to me that we then need four classes utf8_string utf16_string utf32_string utf_string // the dynamic one