Re: [boost] Should we add two simple character-to-Unicode converters?

Hi, Both: int_fast32_t char_to_Unicode( char c ); int_fast32_t wchar_to_Unicode( wchar_t c ) will require processing of surrogates on order to be Unicode 4 compliant. A Unicode library is currently under development that will give access to the surrogate ranges directly from the ucd to allow this to be done properly. Please bear with us. Yours, Graham
Message: 13
Date: Fri, 19 Aug 2005 17:15:03 +0400
From: Vladimir Prus <ghost@cs.msu.su>
Subject: Re: [boost] Should we add two simple character-to-Unicode
converters?
To: boost@lists.boost.org
Message-ID: <de4m0n$aq9$1@sea.gmane.org>
Content-Type: text/plain; charset=us-ascii
Daryle Walker wrote:
Nothing fancy, just something like:
int_fast32_t char_to_Unicode( char c );
int_fast32_t wchar_to_Unicode( wchar_t c );
that converts a native character to a Unicode value.
Maybe, but it's hard to comment as you haven't even explained what
those
function will do. What's a "native character" and what a "Unicode value"
and how the conversion will be done? If the first function does conversion
from local 8 bit encoding to unicode then:
- do you have a working implementation?
- isn't dealing with individual characters too slow?
- Volodya
Graham Barnett BEng, ACGI, MCSD/ MCAD .Net, MCSE/ MCSA 2003, CompTIA Sec+

On 8/19/05 2:33 PM, "Graham" <Graham@system-development.co.uk> wrote:
Both:
int_fast32_t char_to_Unicode( char c ); int_fast32_t wchar_to_Unicode( wchar_t c )
will require processing of surrogates on order to be Unicode 4 compliant.
I thought of these functions while considering how Wave process the various phases of C++ translation (see section 2.1 of the standard). I wanted the conversion to be one native-character to one code-point because that is how Phase 1 implies it[1]. If you don't think that's right, maybe we should file a defect with the Standard committee.
A Unicode library is currently under development that will give access to the surrogate ranges directly from the ucd to allow this to be done properly.
[1] In other words, any extended native character (i.e. not a character C++ uses for parsing) must be mapped to one C++ Unicode name, which maps to a single code-point. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
participants (2)
-
Daryle Walker
-
Graham