Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter

16 Aug 2011

      On Sat, Aug 13, 2011 at 23:24, Robert Ramey <ramey@rrsd.com> wrote:
...
Dave Abrahams wrote:
...
...
std::string represents a sequence of "char" objects that happens
to be useful for text processing. It can represent a text in any
encoding.
The question is how we treat this sequence... And this is a
matter of policy and requirements of the library.
I think I agree with Artyom here.  *Somebody* has to decide how that
datatype will be interpreted when we receive it.  Unless we refuse
altogether to accept std::string in our interfaces (which sounds like
a
bad idea to me), why not make the decision that it's UTF-8?
hmmm - why can't we just leave it at "std::string represents a sequence of
"char""
Because we are talking here what 'a sequence of char' means, and you *must*
define it somehow.

and define some derivative class which defines it as a
...
"a refinement of std::string which supports UTF-8 functionality" ?
Even when wrapping it you must still define the conversions from 'sequences
of chars'. Here we come to the original problem.

On Mon, Aug 15, 2011 at 16:19, Stewart, Robert <Robert.Stewart@sig.com>wrote:
...
[...]
As soon as the client did a cast, the client made the claim that
non_utf_string met the requirements of the text class' constructor.  The
problem is that of the client misusing the class by an ill-advised cast.
 What's more, I think Soares indicated a debug-build validation that the
argument indeed was UTF-8.
I don't see a problem in that design, once the constructor is explicit.
I don't want to do any explicit casts. I want UTF-8 by default, at least as
an optional feature for me and others who think like me. I can afford the
risk of writing wrong code, which is really small if you know what you're
doing. And I'm saying this as a maintainer of ~1MLOC codebase which uses
this convention on *windows*.

Regarding UTF-8 validation, it's not bullet-proof. Many non-UTF8 sequences
may pass the validation. 8-bit encodings that don't coincide with ASCII are
even more likely to result in false positives.
...
...
...
Besided it does not harm you in any way
It does. I already use UTF-8 for all my strings, even on
windows, and I don't want the code-bloat of all these
conversions (even if they're no-ops).
What code bloat do you get from NOPs?  Sure, there is more compilation time
for the compiler to parse the text code and then for the optimizer to
streamline it into a NOP, but even that is very likely negligible.
I'm talking about source-code bloat. About the boilerplate code I have to
write even if I already use UTF-8 everywhere:

std::string str = some_utf_8_string;
boost::utf8_function(text(str)); // Yes, I like UTF-8
boost2::utf8_function(str); // but I like it more when it's the default.

-- 
Yakov

Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter

Yakov Galka