Re: [boost] [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

30 May 2012

      ----- Original Message -----
...
From: Matus Chochlik <chochlik@gmail.com>
To: boost@lists.boost.org
Cc: 
Sent: Wednesday, May 30, 2012 10:20 AM
Subject: Re: [boost] [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review
Hi Artyom,
On Mon, May 28, 2012 at 2:33 PM, Artyom Beilis <artyomtnk@yahoo.com> 
wrote:
...
I comments on a library that I want to submit for a formal review.
The library provides an implementation of standard C and C++ library
 functions such that their inputs are UTF-8 aware on Windows without
 requiring using Wide API to make program work on Windows.
here are my 0.02 Euro:
I completely agree that for general-purpose text storage and handling
(reading lines from text-file/console, reading user input from
GUI, displaying formatted (and localized) messages to the user
in a UI, etc., etc.) UTF-8 should *finally* be adopted.
The other encodings (including UCS-2, UTF-16/32) have their
uses, but should be treated as special cases.
The nowide library is certainly useful within the (limited) scope of working
with text obtained from the OS and passed to the OS where you
can make some assumptions and guess the encoding that the
OS uses and do the conversions from and to UTF8, BUT ...
many text-handling applications tend also use third-party libraries
which also have their own ideas about text encodings and your library
would be *much* more useful if it allowed to "talk" to such libraries
(or devices).
So let me reiterate some points I already mentioned in the earlier
text-related discussions here:
[snip]
1) Let's use std::string as a encoding-agnostic string... [snip]
2) Let's implement a text storage class (and let's call it) text;
This class would store text ... [snip]
[snip]
The encoding tags would specify both concrete encodings
like UTF-16 or ISO-8859-2, etc. and symbolic encodings
[snip]
I want to stop this direction and discussion before it begins.

This library is not generic library to handle text in all encodings
and handle all possible 3rd part libraries and convert between
them, and this library is not intended to be so.

The potential user of this library do not want to handle 101 encodings
one wants to use ONE and SINGLE encoding all over its application
and convert the strings to Wide encoding on Windows libraries boundaries and
pass the UTF-8 string as is on Unix programs.

Note: The developer that uses this library considers ANSI
      API as broken and only Wide API is a valid API on 
      Windows.

So no this library is not Boost.Text it is:

 "I want to use UTF-8 in may application... and I want to use
  only Wide API on Windows as the only correct API to use"
...
[snip]
If I'm not terribly mistaken all the code for conversions between
encodings already is part of Boost.Locale.
Yes and Boost.Nowide uses UTF-to-UTF conversion part (that is header only one
in Boost.Locale)
...
Then all the useful things like the nowide::args class and
the wrappers around iostreams, etc. could be implemented
on top of that.
The library does not reinvent the wheel :-),
it uses boost::locale::utf... (which I BTW the author of it)
...
Best,
Matus

Artyom Beilis
--------------
CppCMS - C++ Web Framework:   http://cppcms.com/
CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/