
Thorsten Ottosen wrote:
Hi Erik,
Hi! Thanks for your reply.
Let me first say that its good to see that progress is happening on this important topic.
Here are just some small comments; I didn't follow the first discussion, so maybe these things have already been answered.
| Current design: | The current design is based around the concept of «encoding_traits».
Is entirely improper to make unicode strings a typedef for std::basic_string<...> ?
Not entirely, but certainly less that optimal. basic_string (and the iostreams) make assuptions that don't neccesarily apply to Unicode text. One of them is that strings can be represented as a sequence of equally sized characters. Unicode can be represented that way, but that would mean you'd have to use 32 bits pr. character to be able to represent all the code point assigned in the Unicode standard. In most cases, that is way too much overhead for a string, and usually also a waste, since unicode code points rarely require more that 16 bits to be encoded. You could of course implement unicode for 16 bit characters in basic_string, but that would require that the user know about things like surrogate pairs, and also know how to correctly handle them. An unlikely scenario. By using encoding_traits however, we are able to make a string class that internally works with 8, 16 or 32 bit code units (UTF-8, 16 and 32 respectively), but that has an external interface that uses 32 bit code points, abstracting away the underlying encoding. By doing it that way we easily halve the effective size of a string for most users. (When using UTF-16 for example)
and what is the benefit of having a function vs a function template? surely a function template will look the same to the client as an ordinary function; Is it often used that people must change encoding on the fly?
Normally I would not think so, and my first implementation did not work this way. That one was implemented with the entire string class being templated on encoding, and thereby eliminating the whole implementation inheritance tree in this implementation. There was however (as far as I could tell at least) some concern about this approach in the other thread. (Mostly related to code size and being locked into an encoding at compile time.) Some thought that could be a problem for XML parsers and related technology that needs to establish encoding at run-time. (When reading files for example) This new implementation was simply a test to see if an alternate solution could be found, without those drawbacks. (It has a plenthora of new ones though.) I am more than willing to change this if the current design is no good. Starting a discussion on this is one of my main reasons for posting the code in the first place.
|You do however gain speed (I would assume), since you |wouldn't have the overhead of virtual function-calls, as well as a less |complex implementation.
It would be good to see some real data on how much slower it gets. If the slowdown is high, then you should consider a two-layered approach (implementing the virtual functions in terms of the non-virtual) or to remove the virtual functions altogether.
Yep. Some profiling of the different designs would be a good idea, and will probably be done in the near future.
-Thorsten
- Erik