
Ferdinand Prantl wrote:
Hello,
From: Vladimir Prus [mailto:ghost@cs.msu.su]
glib did a very good implementation of UTF-8 handling and Glibmm is a well done
C++ wrapper
but it lacks the "standardness". Something like
boost::ustring COULD
bring a widely accepted UTF-8 aware unicode string to C++
programmers.
A somewhat relieving thought.
I am not exactly sure if UTF-8 or UCS-4 is better as universal solution, but some solution is surely needed.
I am afraid there is no universal solution for all users. The easiest solution is based on the native basic_string<>, which is specialized for char (8-bit) to support ASCII/ANSI encodings and for wchar_t (16-bit) usually used for UCS-2 encoded strings. UCS-4 (32-bit) encoding would require another basic_string<> specialization.
UCS-2 held all characters in Unicode 1.1, There was a need for more unique numbers and UCS-4 was introduced in Unicode 2.0. Unfortunately there is no 4-byte character specialization for basic_string<> in STL yet.
Technically, there isn't a 2-byte specialization either; wchar_t might not be 16 bits. Bob