
On Wed, Jan 26, 2011 at 5:01 PM, Matus Chochlik <chochlik@gmail.com> wrote:
On Wed, Jan 26, 2011 at 9:25 AM, Dean Michael Berris <mikhailberis@gmail.com> wrote:
On Wed, Jan 26, 2011 at 3:47 PM, Matus Chochlik <chochlik@gmail.com> wrote:
On Fri, Jan 21, 2011 at 1:07 PM, Dean Michael Berris <mikhailberis@gmail.com> wrote:
[snip/] I also prefer nothing too fancy. But most of these things are implementation details, let us get the interface right first and focus on the optimizations afterwards.
Actually, it's not an implementation detail. Value semantics has everything to do the interface and not the implementation.
It's just that, at the time I was thinking about and writing this reply, I was just really wanting something lightweight and allowed for unbridled cross-thread access. That original assumption of mine that reference counting was a bad thing has since been clarified by others in the ensuing threads.
I didn't say that I regard the immutability or value semantics to be an implementation detail. But some part of the discussion focused on if we should employ COW, how to implement it, etc.
Sure, which is also where the reference counting implementation lies. Details like that are deal-breakers in performance-critical code and if we're talking about replacing std::string or implementing a competing string, it would have to beat the std::string performance (however bad/good that is).
Value semantics - a part of the interface specification - can be implemented in a number of ways.
I don't see though how else value semantics can be implemented aside from having: a default constructor, an assignment operator, a copy constructor, and later on maybe an optimal move constructor. That's really all there is to value semantics -- some would argue that swap is "necessary" too but I'm not convinced that swap is really required for value semantics.
But I we already have these everyday nice and convenient text handling algorithms in Boost.Algorithm's String_algo library.
But still it is encoding agnostic, which is bad in many cases.
So it's the algorithms that are the problem -- for being encoding agnostic -- and not really the string is that what you're implying?
As a matter of fact, *all* the implementations cited about dealing with UTF-8 and UTF-16 have everything to do with wrapping raw data into a view of it that (unfortunately) allows for mutating transformations.
Note also that I wasn't even going into the generic point of stringsdo not being a sequence of anything other than characters to be read. That's a different topic that I don't want to get into at this time. But even the pedantic definition of a string doesn't include mutability as an intrinsic requirement.
I really do not have anything against the immutability and the value semantics, see above. I think you misunderstood me :)
I think I didn't understand what you meant when you referred to implementation details. ;)
A few things here:
1. This is totally fine with an immutable string implementation. I don't see any mutations going on here.
Me neither :-) What I see however is that it fails because of encoding.
I still don't understand this though. What does encoding have to do with the string? Isn't encoding a separate process?
3. String I/O can be defined independently of the string especially if you're dealing with C++ streams. I don't see why the above would be a problem with an immutable string implementation.
Agreed, but again it has to be convenient.
We may have different definitions of convenient, but I like the way string-streams and iostreams in the standard do it. For all intents and purposes a stringbuf implementation that deals with efficient allocation precisely for immutable strings would be nice to have.
So you'd say yuck to any STL algorithm that dealt with iterators? Have you used the Boost.Iterators library yet because then you'd be calling all those chaining/wrapping operations "yucky" too. ;)
Some of them ? Yes, in many situations.
How about Boost.RangeEx-wrapped STL algorithms? I for one like the simplicity and flexibility of it which may explain why I think we have different interpretations of "convenient". For me, iterators and layering operations on iterators, and then feeding them through algorithms is the convenient route. Anything that resembles Java code or Smalltalk-like "OOP message-passing" inspired interfaces just don't seem enticing to my brain anymore. Maybe that's largely a problem with me than with the code although the jury's still out on that one. ;)
[snip/]
But the problem there is "nice" is really subjective. I absolutely abhor code like this:
boost::string s = "Foo"; s.append("Bar").append("Baz");
When I can express it entirely with less characters and succinctly with this instead:
boost::string s = "Foo" ^ "Bar" ^ "Baz";
Agreed, this is a matter of opinion and while I see the beauty of what you propose, it may not be clear what you mean by "Foo" ^ "Bar". If I learned something from this whole discussion, then it is that it's not nice to shove anything (programming style included) down anyones throat :-)
Right on both counts. :D To be more "complete" about it though the semantics of "+" on strings is really a misnomer. The "+" operator signifies associativity which string concatenation is not -- and you're really not adding string values either. What you want is an operator that conveys "I'm joining the string on the left with the one on the right in the specified order" -- because the "^" operator is left associative and can be used as a joining symbol, it fits the use case for strings better. So you read it as: "Foo" joined with "Bar" joined with ...
I think you're missing something here though.
The point of creating a new string implementation is so that you can generalize a whole family of string-related algorithms around a well-defined abstraction. In this case there's really no question that a string of characters is used to represent "text" -- although it can very well represent a lot of other things too. However you cut it though the abstraction bears out of algorithms that have something to do with strings like: concatenation, compression, ordering, encoding, decoding, rendering, sub-string, parsing, lexical analysis, search, etc.
And I think you misunderstand me, I *do not* want to stop us from doing such implementation of string. But just as it is important for you to have the generic string class, it is important for me to have the "nice" 'text' class :) I even don't have anything against boost::text to be implemented as a special case of boost::string if it is possible/wise.
I still don't understand what "nice" is. I think precisely because "nice" is such a subjective thing I fear that without any objective criterion to base the definition of an interface/implementation on, we will keep chasing after what's "nice" or "convenient". OTOH if we agree that algorithms are as important as the abstraction, then I think it's better if we agree what the correct abstraction is and what the algorithms we intend to implement/support are. In that discussion what's "nice" is largely a matter of taste. ;)
Also, last time I checked, there are already a ton of Unicode-encoding libraries out there, I don't see why there's a need for yet-another-encoding-library for character strings. This is why I think I'm liking the way Boost.Locale is handling it because it conveys that the library is about making a common interface through which different back-ends can be plugged into. If Boost.Locale dealt with iterators then I think having a string library that is better than std::string in more ways than one gives us a good way of tackling the cross-platform string encoding issue. But there I stress, I think C++ needs a better than the standard string implementation.
And what is their level of acceptance by different APIs ?
I think we need to qualify what you refer to as APIs. If just judging from the amount of code that's written against Qt or MFC for example then I'd say "they're pretty well accepted". If you look at the libraries that use ICU as a backend I'd say we already have one in Boost called Boost.Regex. And there's all these other libraries in the Linux arena that have their own little niche to play in the Unicode game -- there's Glib, the GNOME and KDE libraries, ad nauseam. I really think we don't want to be playing the "one ring in the darkness bind them" game here. If we want to change the way things are going, there's little point in preserving the status quo IMO especially if we're all in agreement that the status quo is broken. And now that we've decided that it's something worth fixing let's fix it in a way that's actually different from how everyone else has tried to do before. Doing the same thing over and over and expecting a different result is insanity -- paraphrased from someone important that I should know who really. ;)
I think this is a slippery slope though. If we make the boost::string look like something that is mutable without it being really mutable, then you have a disconnect between the interface and the semantics you want to convey.
Having member functions like 'append' and 'prepend' makes you think that you're modifying the string when in fact you're really building another string. I've already pointed out that string construction can very well be handled by the string streams so I don't think we want to encourage people to think of strings as state-ful objects with mutable semantics because that's not the original intention of the string.
By forcing users of the string to make it look like they're building a string instead of "modifying and existing string" *should* be conveyed in the interface. This is largely an issue of documentation though.
Again, this is a matter of taste.
Actually, I think it's a matter of design, not taste.
Is the enforcing of our "superior" interface design really that much more important then level of acceptability by other people which do not share the same opinion ?
What opinion is there to be had? If the string is immutable why would you want to make it look like it is mutable?
Nobody forces you to use append/ prepend and you should not force others to use the operator ^.
Well, the primitive data types force you to use the operators defined on them. Spirit forces you to define rules using the DSEL. So does the MSM library. The BGL forces you to use the graph abstraction if you intend to deal with that library. I don't see why it's unreasonable to force operator^ for consistency's sake.
IMO in this case you are even in an advantage, because append/ prepend/etc. would be wrappers around "your" :) interface. And, yes, they should be clearly documented as such.
But the point of the thing being immutable is lost in translation. More to the point, operator^ has simple semantics as opposed to 'append' and 'prepend' which are two words for the same operation with just the order of the operands switched around. Am I missing something here? -- Dean Michael Berris about.me/deanberris