Re: [boost] [string] proposal

26 Jan 2011

      On Wed, Jan 26, 2011 at 5:01 PM, Matus Chochlik <chochlik@gmail.com> wrote:
...
On Wed, Jan 26, 2011 at 9:25 AM, Dean Michael Berris
<mikhailberis@gmail.com> wrote:
...
On Wed, Jan 26, 2011 at 3:47 PM, Matus Chochlik <chochlik@gmail.com> wrote:
...
On Fri, Jan 21, 2011 at 1:07 PM, Dean Michael Berris
<mikhailberis@gmail.com> wrote:
...
[snip/]
I also prefer nothing too fancy. But most of these things
are implementation details, let us get the interface
right first and focus on the optimizations afterwards.
Actually, it's not an implementation detail. Value semantics has
everything to do the interface and not the implementation.
It's just that, at the time I was thinking about and writing this
reply, I was just really wanting something lightweight and allowed for
unbridled cross-thread access. That original assumption of mine that
reference counting was a bad thing has since been clarified by others
in the ensuing threads.
I didn't say that I regard the immutability or value semantics to
be an implementation detail. But some part of the discussion
focused on if we should employ COW, how to implement it,
etc.
Sure, which is also where the reference counting implementation lies.
Details like that are deal-breakers in performance-critical code and
if we're talking about replacing std::string or implementing a
competing string, it would have to beat the std::string performance
(however bad/good that is).
...
Value semantics - a part of the interface specification -
can be implemented in a number of ways.
I don't see though how else value semantics can be implemented aside
from having: a default constructor, an assignment operator, a copy
constructor, and later on maybe an optimal move constructor. That's
really all there is to value semantics -- some would argue that swap
is "necessary" too but I'm not convinced that swap is really required
for value semantics.
...
...
But I we already have these everyday nice and convenient text handling
algorithms in Boost.Algorithm's String_algo library.
But still it is encoding agnostic, which is bad in many cases.
So it's the algorithms that are the problem -- for being encoding
agnostic -- and not really the string is that what you're implying?
...
...
As a matter of fact, *all* the implementations cited about dealing
with UTF-8 and UTF-16 have everything to do with wrapping raw data
into a view of it that (unfortunately) allows for mutating
transformations.
Note also that I wasn't even going into the generic point of stringsdo not
being a sequence of anything other than characters to be read. That's
a different topic that I don't want to get into at this time. But even
the pedantic definition of a string doesn't include mutability as an
intrinsic requirement.
I really do not have anything against the immutability
and the value semantics, see above. I think you
misunderstood me :)
I think I didn't understand what you meant when you referred to
implementation details. ;)
...
...
A few things here:
1. This is totally fine with an immutable string implementation. I
don't see any mutations going on here.
Me neither :-) What I see however is that it fails because
of encoding.
I still don't understand this though. What does encoding have to do
with the string? Isn't encoding a separate process?
...
...
3. String I/O can be defined independently of the string especially if
you're dealing with C++ streams. I don't see why the above would be a
problem with an immutable string implementation.
Agreed, but again it has to be convenient.
We may have different definitions of convenient, but I like the way
string-streams and iostreams in the standard do it. For all intents
and purposes a stringbuf implementation that deals with efficient
allocation precisely for immutable strings would be nice to have.
...
...
So you'd say yuck to any STL algorithm that dealt with iterators? Have
you used the Boost.Iterators library yet because then you'd be calling
all those chaining/wrapping operations "yucky" too. ;)
Some of them ? Yes, in many situations.
How about Boost.RangeEx-wrapped STL algorithms?

I for one like the simplicity and flexibility of it which may explain
why I think we have different interpretations of "convenient". For me,
iterators and layering operations on iterators, and then feeding them
through algorithms is the convenient route. Anything that resembles
Java code or Smalltalk-like "OOP message-passing" inspired interfaces
just don't seem enticing to my brain anymore.

Maybe that's largely a problem with me than with the code although the
jury's still out on that one. ;)
...
[snip/]
...
But the problem there is "nice" is really subjective. I absolutely
abhor code like this:
 boost::string s = "Foo";
 s.append("Bar").append("Baz");
When I can express it entirely with less characters and succinctly
with this instead:
 boost::string s = "Foo" ^ "Bar" ^ "Baz";
Agreed, this is a matter of opinion and while
I see the beauty of what you propose, it may
not be clear what you mean by "Foo" ^ "Bar".
If I learned something from this whole discussion,
then it is that it's not nice to shove anything (programming
style included) down anyones throat :-)
Right on both counts. :D

To be more "complete" about it though the semantics of "+" on strings
is really a misnomer. The "+" operator signifies associativity which
string concatenation is not -- and you're really not adding string
values either. What you want is an operator that conveys "I'm joining
the string on the left with the one on the right in the specified
order" -- because the "^" operator is left associative and can be used
as a joining symbol, it fits the use case for strings better.

So you read it as: "Foo" joined with "Bar" joined with ...
...
...
I think you're missing something here though.
The point of creating a new string implementation is so that you can
generalize a whole family of string-related algorithms around a
well-defined abstraction. In this case there's really no question that
a string of characters is used to represent "text" -- although it can
very well represent a lot of other things too. However you cut it
though the abstraction bears out of algorithms that have something to
do with strings like: concatenation, compression, ordering, encoding,
decoding, rendering, sub-string, parsing, lexical analysis, search,
etc.
And I think you misunderstand me, I *do not* want to stop us
from doing such implementation of string. But just as it is important
for you to have the generic string class, it is important for me to have
the "nice" 'text' class :) I even don't have anything against
boost::text to be implemented as a special case of boost::string
if it is possible/wise.
I still don't understand what "nice" is. I think precisely because
"nice" is such a subjective thing I fear that without any objective
criterion to base the definition of an interface/implementation on, we
will keep chasing after what's "nice" or "convenient".

OTOH if we agree that algorithms are as important as the abstraction,
then I think it's better if we agree what the correct abstraction is
and what the algorithms we intend to implement/support are. In that
discussion what's "nice" is largely a matter of taste. ;)
...
...
Also, last time I checked, there are already a ton of Unicode-encoding
libraries out there, I don't see why there's a need for
yet-another-encoding-library for character strings. This is why I
think I'm liking the way Boost.Locale is handling it because it
conveys that the library is about making a common interface through
which different back-ends can be plugged into. If Boost.Locale dealt
with iterators then I think having a string library that is better
than std::string in more ways than one gives us a good way of tackling
the cross-platform string encoding issue. But there I stress, I think
C++ needs a better than the standard string implementation.
And what is their level of acceptance by different APIs ?
I think we need to qualify what you refer to as APIs. If just judging
from the amount of code that's written against Qt or MFC for example
then I'd say "they're pretty well accepted". If you look at the
libraries that use ICU as a backend I'd say we already have one in
Boost called Boost.Regex. And there's all these other libraries in the
Linux arena that have their own little niche to play in the Unicode
game -- there's Glib, the GNOME and KDE libraries, ad nauseam.

I really think we don't want to be playing the "one ring in the
darkness bind them" game here. If we want to change the way things are
going, there's little point in preserving the status quo IMO
especially if we're all in agreement that the status quo is broken.
And now that we've decided that it's something worth fixing let's fix
it in a way that's actually different from how everyone else has tried
to do before. Doing the same thing over and over and expecting a
different result is insanity -- paraphrased from someone important
that I should know who really. ;)
...
...
I think this is a slippery slope though. If we make the boost::string
look like something that is mutable without it being really mutable,
then you have a disconnect between the interface and the semantics you
want to convey.
Having member functions like 'append' and 'prepend' makes you think
that you're modifying the string when in fact you're really building
another string. I've already pointed out that string construction can
very well be handled by the string streams so I don't think we want to
encourage people to think of strings as state-ful objects with mutable
semantics because that's not the original intention of the string.
By forcing users of the string to make it look like they're building a
string instead of "modifying and existing string" *should* be conveyed
in the interface. This is largely an issue of documentation though.
Again, this is a matter of taste.
Actually, I think it's a matter of design, not taste.
...
Is the enforcing of our "superior" interface design really that much
more important then level of acceptability by other people which
do not share the same opinion ?
What opinion is there to be had? If the string is immutable why would
you want to make it look like it is mutable?
...
Nobody forces you to use append/
prepend and you should not force others to use the operator ^.
Well, the primitive data types force you to use the operators defined
on them. Spirit forces you to define rules using the DSEL. So does the
MSM library. The BGL forces you to use the graph abstraction if you
intend to deal with that library.

I don't see why it's unreasonable to force operator^ for consistency's sake.
...
IMO in this case you are even in an advantage, because append/
prepend/etc. would be wrappers around "your" :) interface.
And, yes, they should be clearly documented as such.
But the point of the thing being immutable is lost in translation.
More to the point, operator^ has simple semantics as opposed to
'append' and 'prepend' which are two words for the same operation with
just the order of the operands switched around.

Am I missing something here?

-- 
Dean Michael Berris
about.me/deanberris