
Soares Chen Ruo Fei wrote:
A while ago I gave some previews of my Unicode String Adapter library to the boost community but I didn't receive much feedback. Now that GSoC is ending I'd like you all to take a look at my project again and provide feedback on the usefulness of the library. Following are the links to my project repository and documentation:
GitHub repository: https://github.com/crf00/boost.ustr Documentation: http://crf.scriptmatrix.net/ustr/index.html
I think there are probably as many ways to implement a "better" string as there are potential users, and previous long discussions here have considered those possibilities at great length. In summary your proposal is for a string that is: - Immutable. - Reference counted. - Iterated by default over unicode code points. - Provides access to the code units via operator* and operator->, i.e. s.begin() // Returns a code point iterator. s->begin() // Returns a code unit iterator. I won't comment about the merits or otherwise of those points, apart from the last, where I'll note that it is not to my taste. It looks like it's "over clever". Imagine that I wrote some code using your library, and then a colleague who was not familiar with it had to look at it later. Would they have any idea about the difference between those two cases? No, not unless I added a comment every time I used it. Please let's have an obvious syntax like: s.begin() // Code points. s.impl.begin() // Code units. or s.units_begin() // Code units. Personally, I don't want a new clever string class. What I want is a few well-written building-blocks for Unicode. For example, I'd like to be able to iterate over the code points in a block of UTF-8 data in raw memory, so some sort of iterator adaptor is needed. Your library does have this functionality, but it is hidden in an implementation detail. Please can you consider bringing out your core UTF encoding and decoding functions to the public interface? I would also like to see some benchmarks for the core UTF conversion functions. If you post some benchmarks that decouple the UTF conversion from the rest of the string class, I will compare the performance with my own code. Regards, Phil.