Should we add two simple character-to-Unicode converters?

Nothing fancy, just something like: int_fast32_t char_to_Unicode( char c ); int_fast32_t wchar_to_Unicode( wchar_t c ); that converts a native character to a Unicode value. They would need a separate source file that contains #if blocks for each platform. Maybe they can start a namespace with "utf8_codecvt_facet"? -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

Daryle Walker wrote:
Nothing fancy, just something like:
int_fast32_t char_to_Unicode( char c ); int_fast32_t wchar_to_Unicode( wchar_t c );
that converts a native character to a Unicode value.
Maybe, but it's hard to comment as you haven't even explained what those function will do. What's a "native character" and what a "Unicode value" and how the conversion will be done? If the first function does conversion from local 8 bit encoding to unicode then: - do you have a working implementation? - isn't dealing with individual characters too slow? - Volodya

On 8/19/05 9:15 AM, "Vladimir Prus" <ghost@cs.msu.su> wrote:
Daryle Walker wrote:
Nothing fancy, just something like:
int_fast32_t char_to_Unicode( char c ); int_fast32_t wchar_to_Unicode( wchar_t c );
that converts a native character to a Unicode value.
Maybe, but it's hard to comment as you haven't even explained what those function will do. What's a "native character" and what a "Unicode value" and how the conversion will be done? If the first function does conversion from local 8 bit encoding to unicode then:
"Native characters" are the character set a particular platform uses. Before the Unicode era, a platform (could) assume that all text files used the platform's character set. (e.g. MacRoman for pre-X Macs, Cp-1251 for Windows, Latin-1 for UNIX) My functions assume a one-to-one mapping from a native character to a Unicode code-point, because Phase 1 of C++ translation (see section 2.1 of the standard) assumes that.
- do you have a working implementation?
No. I'm just requesting for comments.
- isn't dealing with individual characters too slow?
Probably. Maybe we could add an iterator-copying version. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

The "Dataflow Iterators" section of the serialization library contains iterators for this purpose. Its been part of boost since last November. Robert Ramey Daryle Walker wrote:
On 8/19/05 9:15 AM, "Vladimir Prus" <ghost@cs.msu.su> wrote:
Daryle Walker wrote:
Nothing fancy, just something like:
int_fast32_t char_to_Unicode( char c ); int_fast32_t wchar_to_Unicode( wchar_t c );
that converts a native character to a Unicode value.
Maybe, but it's hard to comment as you haven't even explained what those function will do. What's a "native character" and what a "Unicode value" and how the conversion will be done? If the first function does conversion from local 8 bit encoding to unicode then:
"Native characters" are the character set a particular platform uses. Before the Unicode era, a platform (could) assume that all text files used the platform's character set. (e.g. MacRoman for pre-X Macs, Cp-1251 for Windows, Latin-1 for UNIX) My functions assume a one-to-one mapping from a native character to a Unicode code-point, because Phase 1 of C++ translation (see section 2.1 of the standard) assumes that.
- do you have a working implementation?
No. I'm just requesting for comments.
- isn't dealing with individual characters too slow?
Probably. Maybe we could add an iterator-copying version.
participants (3)
-
Daryle Walker
-
Robert Ramey
-
Vladimir Prus