
Soares Chen Ruo Fei wrote:
Hi Phil,
On Aug 9, 2011, Phil Endecott wrote:
I think there are probably as many ways to implement a "better" string as there are potential users, and previous long discussions here have considered those possibilities at great length. ?In summary your proposal is for a string that is:
- Immutable. - Reference counted. - Iterated by default over unicode code points.
I think you misunderstood my point.
No, I believe I understand what you are doing.
Boost.Ustr does not attempt to redesign another string class to begin with. Instead it wraps existing string class that is provided through the template parameter and rely on that string class for actual container operations.
No, because:
The immutability of the string adapter is actually achieved by holding a smart pointer to the const version of the raw string.
If you were just wrapping an existing string class, you wouldn't do that; you'd just wrap the existing string class. By adding this extra bit, you're making a string that is immutable, copy-on-write and reference counted - whether or not the underlying string is or not.
- Provides access to the code units via operator* and operator->, i.e. ? ?s.begin() ?// Returns a code point iterator. ? ?s->begin() // Returns a code unit iterator.
I won't comment about the merits or otherwise of those points, apart from the last, where I'll note that it is not to my taste. ?It looks like it's "over clever". ?Imagine that I wrote some code using your library, and then a colleague who was not familiar with it had to look at it later. ?Would they have any idea about the difference between those two cases? ?No, not unless I added a comment every time I used it. ?Please let's have an obvious syntax like:
? ?s.begin() ? ? ? // Code points. ? ?s.impl.begin() ?// Code units. ?or s.units_begin() // Code units.
The actual intention of operator ->() is not actually to provide access to code unit iterator, instead it is used for programmers to access some raw string functionalities that unicode_string_adapter is not able to provide.
Whatever. The point is that you have this operator* and operator-> overload whose purpose is non-obvious to someone looking at code that uses it. What is your rationale for doing that, rather than providing e.g. an impl() or base() or similar accessor? Can you give examples of any precedents for this usage? What names or syntax do other wrapper/adaptor/facade implementations use?
Your library does have [raw UTF encoding and decoding functions] , but it is hidden in an implementation detail. ?Please can you consider bringing out your core UTF encoding and decoding functions to the public interface?
My encoder/decoder functions are actually quite similar to Mathias' implementation. (in fact I referred to his design before implementing my own) However these function interfaces are specifically designed to fit the internal usage of Boost.Ustr, albeit I made them generic enough. The reason I did not directly use/copy Mathias' implementation is because the interfaces are slightly different and I wanted to avoid obscured bugs, and because the algorithm is simple enough to re-implement, and also because I wanted to take this chance to learn the encoding algorithms (and I did learn something). :) But I'd agree that it shouldn't be hard to refactor the encoders and marge with Mathias' implementation when the time comes.
Currently I do not have plan to make iterator adapters on top of these encoding/decoding functions, and I think it is also a bit redundant as Mathias has already gone through the mess of generating these functions using macros and template metaprogramming. ;)
Well I don't really care who does it, but I think we should have these UTF encoding and decoding functions somewhere in Boost that is not an implementation detail of some other library.
I would also like to see some benchmarks for the core UTF conversion functions. ?If you post some benchmarks that decouple the UTF conversion from the rest of the string class, I will compare the performance with my own code.
At this time I am focusing on design issues rather than optimizations, so I didn't think much about benchmarks. I'd guess that the encoding/decoding speed is probably inferior to other encoder/decoder functions. You can see in my implementation that I did not use obscured hacks that can shorten the code while mathematically remain the same. Instead I focused on readability first so that even amateurs can read the code and easily learn how the encoding/decoding process works. So if you are writing performance critical application that encode/decode huge amount of Unicode text, I'd say that Boost.Ustr is probably not for you (yet).
OK, it's not for me, that's a shame. Maybe if you're lucky someone who DOES want this functionality will now post a reply to your request for comments... Regards, Phil.