Re: [boost] [string] proposal

27 Jan 2011

      On Thu, Jan 27, 2011 at 4:49 AM, Dean Michael Berris
<mikhailberis@gmail.com> wrote:
...
On Thu, Jan 27, 2011 at 2:28 AM, Matus Chochlik <chochlik@gmail.com> wrote:
...
On Wed, Jan 26, 2011 at 6:26 PM, Dean Michael Berris
<mikhailberis@gmail.com> wrote:
...
On Thu, Jan 27, 2011 at 12:43 AM, Matus Chochlik <chochlik@gmail.com> wrote:
So really this wrapper is the 'view' that I talk about that carries
with it an encoding and the underlying data. Right?
Basically right, but generally I can imagine that
the encoding would not have to be 'carried' by the view
but just 'assumed'. But if by carrying you mean that it'll
have just some tag template argument without any (too much)
internal state, then: just Right.
Being part of the type is "carrying".
Yes, but a polymorphic type could also "carry" the encoding information
it wasn't clear to me what exactly do you have in mind.

[snip/]
...
I don't think I was questioning why UTF-8 specifically. I was
questioning why there had to be a "default is UTF-8" when really it's
just a sequence of bytes whether UTF-8, MPEG, Base64, MIME, etc.
Last time I checked, JPEG, MPEG, Base64, ASN1, etc., etc., were not
*text* encodings. And I believe that handling text is what the whole
discussion is ultimately about.

[snip/]
...
But this already happens, it's called 7-bit clean byte encoding --
barring any endianness issues, just stuff whatever you already have in
a `char const *` into a socket. HTTP, FTP, and even memcached's
protocol work fine without the need to interpret strings other than a
sequence of bytes; my original opposition is having a string that by
default looked at data in it as UTF-8 when really a string would just
be a sequence of bytes not necessarily contiguous.
Again, where you see a string primarily as a class for handling
raw data, that can be interpreted in hundreds of different ways
I see primarily string as a class for encoding human readable text.

[snip/]
...
...
if there is ...
typedef something_beyond_my_current_level_of_comprehension native_encoding;
typedef ... native_wide_encoding;
On second thought, this probably should be a type templated
with a char type.
...
...
... which works as I described above with your view ...
text my_text = init();
and I can do for example:
ShellExecuteW(
   ...,
   cstr(view<native_wide_encoding>(my_text)),
   ...
);
... (and we can have some shorthand for c_str(view<native_wide_encoding>(x))),
then, basically, Right.
Yes that's the intention. There's even an alternative (really F'n ugly
way) I suggested as well:
 char evil_stack_buffer[256];
 linearize(string, evil_stack_buffer, 256);
Of course it is an alternative, but there are also lots
of functions in various APIs the ShellExecute above
being one of them where you would need 4-10 such
evil_stack_buffers and the performance gain compared
to the loss related to the ugliness and C-ness of the
code is not worth it (for me). If I liked that kind of programming
I would use C all the time and not only in places where
absolutely necessary.
...
Which means you can let the user of the interface define where the
linearized version of the immutable string would be placed.
[snip/]
...
I think I was asking why make a string to default encode in UTF-8 when
UTF-8 was really just a means of interpreting a sequence of bytes. Why
a string would have to do that by default is what I don't understand
-- and which is why I see it as a decoupling of a view and an
underlying string.
I know why UTF-* have their merits for encoding all the "characters"
for the languages of the world. However I don't see why it has to be
default for a string.
Because the byte sequence is interpreted into *text*.
Let me try one more time: Imagine that someone
proposed to you that he creates a ultra-fast-and-generic
type for handling floating point numbers and there would
be ~200 possible encodings for a float or double and
the usage of the type would be

uber_float x = get_x();
uber_float y = get_y();
uber_float z = view<acme_float_encoding_123_331_4342_Z>(x) +
view<acme_float_encoding_123_331_4342_Z>(y);
uber_float w = third::party::math::log(view<acme_float_encoding_452323_X>(z));

would you choose it to calculate your z = x + y
and w = log(z) in the 98% of the regular cases where
you don't need to handle numbers on the helluva-big
scale/range/precision? I would not.
...
...
If this is not obvious; I live in a part of the world where ASCII
is just not enough and, believe me, we are all tired here of juggling
with language-specific code-pages :)
Nope, it's not obvious, but it's largely a matter of circumstance
really I would say. ;)
But the most of the world actually lives under these
circumstances. :)
...
[snip/]

BR,

Matus