Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]

19 Jan 2011

      ...
From: Alexander Lamaison <awl03@doc.ic.ac.uk>
On Wed, 19 Jan 2011 16:13:04 +0100, Matus Chochlik  wrote:
...
...
I do not believe that UTF-8 is the way to go.  In fact I know it is not,
except perhaps for the very near future  for some programmers ( Linux
advocates ).
:-) Just  for the record, I'm not a Linux advocate any more then I'm
a Windows  advocate. I use both .. I'm writing this on a windows machine.
What I  would like is the whole encoding madness/dysfunction (including
but not  limited to the dual TCHAR/whateverchar-based interfaces) to stop.
 Everywhere.
Even if I bought the UTF-8ed-Boost idea, what would we do  about the STL
implementation on Windows which expects local-codepage narrow  strings?  Are
we hoping MS etc. change these to match?  Because  otherwise we'll be
converting between narrow encodings for the rest of  eternity.
Alex
First of all today there **is** problem and STL code can't open
file, try to open "שלום-سلام-pease.txt" under Windows using
GCC's std::fstream... You can't.

I assume with some other compilers it happens as well.

There **is** problem ignoring it would not help us.

How can we address STL problem and UTF-8? Simply?

Provide:

   boost::basic_fstream
   boost::fopen
   boost::freopen
   boost::remove
   boost::rename

Which are using same std::* classes under Posix platform
and UTF-8 aware implementations for Windows.

Take a look on this code: 

  http://art-blog.no-ip.info/files/nowide.zip

This is the code I use for my projects that implements
what I'm talking about - simple easy to use straightforward.

Also don't forget two things:

1. Microsoft Deprecated ANSI API and does not recommend
   to use it.

   If the only OS that gives us most of the encodings headache
   deprecated ANSI API I don't think that Boost should
   continue supporting it.

2. All the world had already moved to Unicode, Microsoft
   did this as well.

   They did it in their incompatible-with-rest-of-the-world
   way... But still they did it too - so we can continue
   ignoring the fact that UTF-8 is ultimate encoding
   or go forward.

Artyom