Re: [boost] [string] proposal

28 Jan 2011

      On Fri, Jan 28, 2011 at 10:31 PM, Dean Michael Berris
<mikhailberis@gmail.com> wrote:
...
On Sat, Jan 29, 2011 at 5:13 AM, Matus Chochlik <chochlik@gmail.com> wrote:
...
On Fri, Jan 28, 2011 at 9:46 PM, Dean Michael Berris
...
...
  All the discussion in started because we need UTF-8
  in strings now we are back to the beginning?
No, the discussion started because we need a UTF-8 view of data. You
missed the point I was making. And you didn't understand the document
I wrote.
Sorry, but no. The discussion started by the proposal that we should
by default treat std::strings as if they were UTF-8 encoded.
Artyom should know because he was the one who did the original
proposal. The whole 'view' idea was brought up only much later.
And the point I was making was that, doing precisely this was the
"wrong" way of doing it. Assuming a default encoding is "unnecessary"
as an encoding is largely a matter of interpretation of data
ultimately.
I was attempting to solve the problem that is std::string. In the
process I'm moving the issue away from the underlying data and moving
it to a matter of interpretation. To do that in a manner that would
make sense as how I see it, that means moving it into a view of the
data that is held in a string. The string would be the data structure,
the view an interpretation of it.
I never precluded that the string can hold UTF-8 encoded data, but
saying that is the default achieves nothing and is ultimately
unnecessary. In the design I've been proposing the point of the matter
is, interpreting data in a given encoding is separate from how the
data is actually stored. Now let's say you have a UTF-8 string
builder, what else would that write in memory aside from UTF-8 encoded
data? It will though still yield a string, which could be interpreted
many different ways -- I just don't see the encoding as something
intrinsic to the string. That means a string can hold UTF-8 encoded
data and I can wrap that in a view for UTF-16 and see that it will not
validate correctly -- unless I wrap the string with a view for UTF-8
first then pass that into a view for UTF-16 and transcoding can happen
on the fly.
Writing algorithms that deal with strings, is different from writing
algorithms that deal with encoded text. That's two different levels.
This explaining, and trying to explain again, the whole point of the
matter makes me sound like a broken record. If you still don't get
what I'm saying then I guess I'm going to have to try a different
route and just show what I mean in terms of code at some point in
time.
Dean, believe me, I got what you said the first time you said
it, like 200 posts ago. I know that the string data is ultimately
stored in the memory as a sequence of bytes. But then you
proposed to solve my problem by suggesting the view<Encoding>
template. Then like 50 posts ago we finally agreed on typedef-ing
and naming it 'text' since using something called view<encoding_tag>
is not acceptable for me.

Now, if this

typedef view<utf8_encoding_tag> text;

is the only line of code where I see the encoding and
I'll be able to do all the text handling, i.e.: searching
for code points/characters (not only bytes), searching for
words, concatenation, splitting, writing it into a file, socket,
etc. and reading it from file, socket, etc., using it
with some c_str-like adapter with C APIs, etc., basically
doing (nearly) everything that I was able to do with std::string
*without* ever mentioning the encoding again, the You already
have me convinced. If I cannot do those things without specifying
the encoding (unless necessary) then this is useless for me
for text handling.

Peace, Love, Best regards,

Matus