Re: [boost] [string] proposal

26 Jan 2011

      On Wed, Jan 26, 2011 at 3:22 PM, Dean Michael Berris
<mikhailberis@gmail.com> wrote:
...
On Wed, Jan 26, 2011 at 5:01 PM, Matus Chochlik <chochlik@gmail.com> wrote:
[snip/]
...
I didn't say that I regard the immutability or value semantics to
be an implementation detail. But some part of the discussion
focused on if we should employ COW, how to implement it,
etc.
Sure, which is also where the reference counting implementation lies.
Details like that are deal-breakers in performance-critical code and
if we're talking about replacing std::string or implementing a
competing string, it would have to beat the std::string performance
(however bad/good that is).
...
Value semantics - a part of the interface specification -
can be implemented in a number of ways.
I don't see though how else value semantics can be implemented aside
from having: a default constructor, an assignment operator, a copy
constructor, and later on maybe an optimal move constructor. That's
really all there is to value semantics -- some would argue that swap
is "necessary" too but I'm not convinced that swap is really required
for value semantics.
I may be wrong, but my idea when I hear you say that a type has
a default constructor, assignment operator, is you talking
about the interface of the type. When you explain how the assignment
operator, etc. is implemented then you are talking about implementation
details :)

[snip/]
...
So it's the algorithms that are the problem -- for being encoding
agnostic -- and not really the string is that what you're implying?
[snip/]
...
...
...
1. This is totally fine with an immutable string implementation. I
don't see any mutations going on here.
Me neither :-) What I see however is that it fails because
of encoding.
I still don't understand this though. What does encoding have to do
with the string? Isn't encoding a separate process?
Hm, my ability to express myself obviously totally su*ks :)
you are completely right, that the encoding is a completely
separate process, and I'm saying that I want it *completely*
to be hidden from my sight, unless it is absolutely necessary
for me to be concerned about it :-)

The means for this would be: Let us build a string, that may
(or may not) be based on your general (encoding agnostic)
string. And this string would handle the transcoding in most
cases without me viewing the underlying byte sequence
by functors that need me *everytime* to specify what encoding
I want explicitly. By default I want UTF-8, if I talk to the OS I
say I want the string in an encoding that the OS expects, not
that I want it in UTF-16, ISO-8859-2, KOI8-R, etc.
If and only if I want to handle the string in another encoding
than Unicode should I have to specify that explicitly.

[snip/]
...
How about Boost.RangeEx-wrapped STL algorithms?
I for one like the simplicity and flexibility of it which may explain
why I think we have different interpretations of "convenient". For me,
iterators and layering operations on iterators, and then feeding them
through algorithms is the convenient route. Anything that resembles
Java code or Smalltalk-like "OOP message-passing" inspired interfaces
just don't seem enticing to my brain anymore.
This is a different matter, Again I may be wrong but I live
under the expression that RangeEx has been implemented
to hide the ugliness of complex STL iterator-based algorithms.

[snip/]
...
...
To be more "complete" about it though the semantics of "+" on strings
is really a misnomer. The "+" operator signifies associativity which
string concatenation is not -- and you're really not adding string
values either. What you want is an operator that conveys "I'm joining
the string on the left with the one on the right in the specified
order" -- because the "^" operator is left associative and can be used
as a joining symbol, it fits the use case for strings better.
So you read it as: "Foo" joined with "Bar" joined with ...
I know that of course because we are having this discussion,
but will it be clear to someone is not participating. It may become
clear when the string gets wider adoption.
...
I still don't understand what "nice" is. I think precisely because
"nice" is such a subjective thing I fear that without any objective
criterion to base the definition of an interface/implementation on, we
will keep chasing after what's "nice" or "convenient".
OTOH if we agree that algorithms are as important as the abstraction,
then I think it's better if we agree what the correct abstraction is
and what the algorithms we intend to implement/support are. In that
discussion what's "nice" is largely a matter of taste. ;)
OK, I think that it is pointless to discuss "nice" :) exactly because
it is very subjective.
...
...
...
Also, last time I checked, there are already a ton of Unicode-encoding
libraries out there, I don't see why there's a need for
yet-another-encoding-library for character strings. This is why I
think I'm liking the way Boost.Locale is handling it because it
conveys that the library is about making a common interface through
which different back-ends can be plugged into. If Boost.Locale dealt
with iterators then I think having a string library that is better
than std::string in more ways than one gives us a good way of tackling
the cross-platform string encoding issue. But there I stress, I think
C++ needs a better than the standard string implementation.
And what is their level of acceptance by different APIs ?
I think we need to qualify what you refer to as APIs. If just judging
from the amount of code that's written against Qt or MFC for example
then I'd say "they're pretty well accepted". If you look at the
libraries that use ICU as a backend I'd say we already have one in
Boost called Boost.Regex. And there's all these other libraries in the
Linux arena that have their own little niche to play in the Unicode
game -- there's Glib, the GNOME and KDE libraries, ad nauseam.
Besides what you mentioned an API for me is for example
WINAPI, POSIX API, OpenGL API, OpenSSL API, etc.
Basically all the functions "exported" by the various C/C++
libraries that I cannot imagine my life without :) and which
expect not a generic iterator range or a view or whatnot
but plain and simple pointer (const char*) pointing to a contiguous
block in memory containing a zero terminated C string,
or if we are luckier expects std::string.
...
What opinion is there to be had? If the string is immutable why would
you want to make it look like it is mutable?
...
Nobody forces you to use append/
prepend and you should not force others to use the operator ^.
Well, the primitive data types force you to use the operators defined
on them. Spirit forces you to define rules using the DSEL. So does the
MSM library. The BGL forces you to use the graph abstraction if you
intend to deal with that library.
I don't see why it's unreasonable to force operator^ for consistency's sake.
...
IMO in this case you are even in an advantage, because append/
prepend/etc. would be wrappers around "your" :) interface.
And, yes, they should be clearly documented as such.
But the point of the thing being immutable is lost in translation.
More to the point, operator^ has simple semantics as opposed to
'append' and 'prepend' which are two words for the same operation with
just the order of the operands switched around.
Am I missing something here?
I see your point of view. You imagine this new string class
to be a completely new beast. Me and I expect that there
are few others, view it as the next std::string. I don't see
any big point in creating another-uber-string, that is *so much*
better in performance, etc. etc. if it does not get wide adoption.
There already are dozens of such strings already.

Best,

Matus