
On Wed, Jan 26, 2011 at 3:22 PM, Dean Michael Berris <mikhailberis@gmail.com> wrote:
On Wed, Jan 26, 2011 at 5:01 PM, Matus Chochlik <chochlik@gmail.com> wrote: [snip/]
I didn't say that I regard the immutability or value semantics to be an implementation detail. But some part of the discussion focused on if we should employ COW, how to implement it, etc.
Sure, which is also where the reference counting implementation lies. Details like that are deal-breakers in performance-critical code and if we're talking about replacing std::string or implementing a competing string, it would have to beat the std::string performance (however bad/good that is).
Value semantics - a part of the interface specification - can be implemented in a number of ways.
I don't see though how else value semantics can be implemented aside from having: a default constructor, an assignment operator, a copy constructor, and later on maybe an optimal move constructor. That's really all there is to value semantics -- some would argue that swap is "necessary" too but I'm not convinced that swap is really required for value semantics.
I may be wrong, but my idea when I hear you say that a type has a default constructor, assignment operator, is you talking about the interface of the type. When you explain how the assignment operator, etc. is implemented then you are talking about implementation details :) [snip/]
So it's the algorithms that are the problem -- for being encoding agnostic -- and not really the string is that what you're implying?
[snip/]
1. This is totally fine with an immutable string implementation. I don't see any mutations going on here.
Me neither :-) What I see however is that it fails because of encoding.
I still don't understand this though. What does encoding have to do with the string? Isn't encoding a separate process?
Hm, my ability to express myself obviously totally su*ks :) you are completely right, that the encoding is a completely separate process, and I'm saying that I want it *completely* to be hidden from my sight, unless it is absolutely necessary for me to be concerned about it :-) The means for this would be: Let us build a string, that may (or may not) be based on your general (encoding agnostic) string. And this string would handle the transcoding in most cases without me viewing the underlying byte sequence by functors that need me *everytime* to specify what encoding I want explicitly. By default I want UTF-8, if I talk to the OS I say I want the string in an encoding that the OS expects, not that I want it in UTF-16, ISO-8859-2, KOI8-R, etc. If and only if I want to handle the string in another encoding than Unicode should I have to specify that explicitly. [snip/]
How about Boost.RangeEx-wrapped STL algorithms?
I for one like the simplicity and flexibility of it which may explain why I think we have different interpretations of "convenient". For me, iterators and layering operations on iterators, and then feeding them through algorithms is the convenient route. Anything that resembles Java code or Smalltalk-like "OOP message-passing" inspired interfaces just don't seem enticing to my brain anymore.
This is a different matter, Again I may be wrong but I live under the expression that RangeEx has been implemented to hide the ugliness of complex STL iterator-based algorithms. [snip/]
To be more "complete" about it though the semantics of "+" on strings is really a misnomer. The "+" operator signifies associativity which string concatenation is not -- and you're really not adding string values either. What you want is an operator that conveys "I'm joining the string on the left with the one on the right in the specified order" -- because the "^" operator is left associative and can be used as a joining symbol, it fits the use case for strings better.
So you read it as: "Foo" joined with "Bar" joined with ...
I know that of course because we are having this discussion, but will it be clear to someone is not participating. It may become clear when the string gets wider adoption.
I still don't understand what "nice" is. I think precisely because "nice" is such a subjective thing I fear that without any objective criterion to base the definition of an interface/implementation on, we will keep chasing after what's "nice" or "convenient".
OTOH if we agree that algorithms are as important as the abstraction, then I think it's better if we agree what the correct abstraction is and what the algorithms we intend to implement/support are. In that discussion what's "nice" is largely a matter of taste. ;)
OK, I think that it is pointless to discuss "nice" :) exactly because it is very subjective.
Also, last time I checked, there are already a ton of Unicode-encoding libraries out there, I don't see why there's a need for yet-another-encoding-library for character strings. This is why I think I'm liking the way Boost.Locale is handling it because it conveys that the library is about making a common interface through which different back-ends can be plugged into. If Boost.Locale dealt with iterators then I think having a string library that is better than std::string in more ways than one gives us a good way of tackling the cross-platform string encoding issue. But there I stress, I think C++ needs a better than the standard string implementation.
And what is their level of acceptance by different APIs ?
I think we need to qualify what you refer to as APIs. If just judging from the amount of code that's written against Qt or MFC for example then I'd say "they're pretty well accepted". If you look at the libraries that use ICU as a backend I'd say we already have one in Boost called Boost.Regex. And there's all these other libraries in the Linux arena that have their own little niche to play in the Unicode game -- there's Glib, the GNOME and KDE libraries, ad nauseam.
Besides what you mentioned an API for me is for example WINAPI, POSIX API, OpenGL API, OpenSSL API, etc. Basically all the functions "exported" by the various C/C++ libraries that I cannot imagine my life without :) and which expect not a generic iterator range or a view or whatnot but plain and simple pointer (const char*) pointing to a contiguous block in memory containing a zero terminated C string, or if we are luckier expects std::string.
What opinion is there to be had? If the string is immutable why would you want to make it look like it is mutable?
Nobody forces you to use append/ prepend and you should not force others to use the operator ^.
Well, the primitive data types force you to use the operators defined on them. Spirit forces you to define rules using the DSEL. So does the MSM library. The BGL forces you to use the graph abstraction if you intend to deal with that library.
I don't see why it's unreasonable to force operator^ for consistency's sake.
IMO in this case you are even in an advantage, because append/ prepend/etc. would be wrappers around "your" :) interface. And, yes, they should be clearly documented as such.
But the point of the thing being immutable is lost in translation. More to the point, operator^ has simple semantics as opposed to 'append' and 'prepend' which are two words for the same operation with just the order of the operands switched around.
Am I missing something here?
I see your point of view. You imagine this new string class to be a completely new beast. Me and I expect that there are few others, view it as the next std::string. I don't see any big point in creating another-uber-string, that is *so much* better in performance, etc. etc. if it does not get wide adoption. There already are dozens of such strings already. Best, Matus