
"Vladimir Prus" <ghost@cs.msu.su> wrote in message news:cl55cd$9ei$1@sea.gmane.org...
Why do you need the traits, at compile-time?
Perhaps I didn't state this clearly enough. The traits class is one of the template parameters of the encoded_string class. (Defaulting to encoding_traits<encoding>) The traits class contains all information about the encoding being specified, like code unit size, and functions for iterating through a code unit sequence. All encoding specific implementation is done in the traits class.
- Why would the user want to change the encoding? Especially between UTF-16 and UTF-32?
Well... Different people have different needs. If you are mostly using ASCII characters, and require small size, UTF-8 would fit your bill. If you need the best general performance on most operations, use UTF-16. If you need fast iteration over code points and size doesn't matter, use UTF-32.
- Why would the user want to specify encoding at compile time? Are there performance benefits to that? Basically, if we agree that UTF-32 is not needed, then UTF-16 is the only encoding which does not require complex handling. Maybe, for other encodings using virtual functions in character iterator is OK? And if iterators have abstract characters" as value_type, maybe the overhead if that is much large that virtual function call even for UTF-16.
Though I haven't confirmed this by testing, I would assume templating the encoding and thus specifying it at compile time would result in better performance since you don't have the overhead of virtual function calls. (Polymorphy would probably be needed if templates were scrapped.) Avoiding virtual calls also enables the compiler to optimize (inline) more thouroughly, something that is very benificial in this case because of the amount of different small, specialized functions that are needed in string manipulation.
(As a side note, discussion about templated vs. non-templated interface seems a reasonable addition to a thethis. It's sure thing that if anybody wrote such a thethis in our lab, he would be asked to justify such a global decisions).
Thanks for the tip! I would probably include a discussion on why templates are used if they end up in a final implementation.
- What if the user wants to specify encoding at run time? For example, XML files specify encoding explicitly. I'd want to use ascii/UTF-8 encoding if XML document is 8-bit, and UTF-16 when it's Unicode.
That is one problem with the templating of encoding. You would have to ether template all file scanning functions in the XML parser on encoding as well, of you would need to do some run-time checks and use the correct template depending on the encoding used in the file. This is of course not ideal, but only where encoding is something that is specified upon run-time. What the most common scenario is, is something that needs to be determined before a final design is decided on.