Review Request: Boost.Locale

newer
Re: [boost] [MSM] compiling MSM...

Artyom

23 May 2010 23 May '10

8:55 p.m.

Hello, I want to request a formal review for Boost.Locale library. Short Description: ------------------ Boost.Locale is powerful localization library that provides powerful localization tool extending existing built-in C++ localization facilities in Unicode aware way. Documentation: -------------- - Reference: http://cppcms.sourceforge.net/boost_locale/html/index.html - Tutorial: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html Sources: -------- - https://sourceforge.net/projects/cppcms/files/boost_locale/boost_locale_for_... Tested Compilers: ----------------- - GCC 3.4, 4.1, 4.2, 4.3, 4.4 (including C++0x support of new Unicode characters) - MSVC 2008 - Intel 11.0 - Sun Studio 12 with STLPort Tested Platforms: ----------------- - Linux 2.6 - FreeBSD 8.0 - OpenSolaris - Windows XP/SP2: incuding MSVC, MinGW and Cygwin Updates from previous versions: ------------------------------- - Redesigned Context information support in message catalogs: switched to native gettext msgctxt support. - Cleanup of MSVC warnings. - MSVC8 fixes. - Build Cleanup. Thank You, Artyom Beilis

Show replies by date

Robert Ramey

24 May 24 May

5:23 a.m.

Artyom wrote:

...

Hello,

I want to request a formal review for Boost.Locale library.

Short Description: ------------------

Boost.Locale is powerful localization library that provides powerful localization tool extending existing built-in C++ localization facilities in Unicode aware way.

Documentation: --------------

- Reference: http://cppcms.sourceforge.net/boost_locale/html/index.html - Tutorial: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html

I just skimmed over the documentation. It looks interesting enough to spend some more time with. One thing in particular that I was interested in is(are) codecvt facets. I didn't any thing on this. Why is that? Is this a separate subject or is that you believe they're not useful. I would be curious to hear your views as to where the fit in. (or don't fit in). Robert Ramey

Artyom

6:10 a.m.

...

One thing in particular that I was interested in is(are) codecvt facets. I didn't any thing on this. Why is that?

Take a look on http://www.cplusplus.com/reference/std/locale/codecvt/ They allow you to imbue special charset to fstream and automatically translate wide characters to normal encoding like UTF-8 or ISO-8859-8.

...

Is this a separate subject or is that you believe they're not useful.

Theoretically they are very useful. For example: std::wofstream fs; fs.imbue(std::locale("he_IL.UTF-8")); fs.open("file.txt"); fs << L"שלום"! Would print UTF-8 output. But... - Many compilers/standard libraries do not implement locales at all. (GCC under Windows and Solaris, STL Port library) - Support of locales and encoding is strictly limited to OS configuration. So on some host the above example would work on other it would throw invalid locale error. - Some compilers/OSes do not support UTF-8 encodings (MSVC) so you can't create UTF-8 locale at all. - Locales name are platform depended. For example under Windows you need Hebrew_Israel.1255 locale and under Linux he_IL.ISO-8859-8 (and BTW 1255!=iso-8859-8) So Boost.Locale reimplements standard codecvt facet to make this work on any platform. However there is still a limitation when working with 2 byte characters (ie char16_t or wchar_t under windows) as Boost.Locale would work correctly only with UCS-2 But this is actually C++ standard's limitation. Best, Artyom

Robert Ramey

8:47 a.m.

Artyom wrote:

...

...
One thing in particular that I was interested in is(are) codecvt facets. I didn't any thing on this. Why is that?

Take a look on http://www.cplusplus.com/reference/std/locale/codecvt/

They allow you to imbue special charset to fstream and automatically translate wide characters to normal encoding like UTF-8 or ISO-8859-8.

...
Is this a separate subject or is that you believe they're not useful.

Theoretically they are very useful.

For example:

std::wofstream fs; fs.imbue(std::locale("he_IL.UTF-8")); fs.open("file.txt"); fs << L"????"!

Would print UTF-8 output.

I'm familiar with the codecvt facet as several of them are used with the serialization library.

...

But...

- Many compilers/standard libraries do not implement locales at all. (GCC under Windows and Solaris, STL Port library) - Support of locales and encoding is strictly limited to OS configuration. So on some host the above example would work on other it would throw invalid locale error. - Some compilers/OSes do not support UTF-8 encodings (MSVC) so you can't create UTF-8 locale at all. - Locales name are platform depended. For example under Windows you need Hebrew_Israel.1255 locale and under Linux he_IL.ISO-8859-8 (and BTW 1255!=iso-8859-8)

I think this is a separate issue than codecvt facet. I've found them to work with all C++ implemenations that boost uses.

...

So Boost.Locale reimplements standard codecvt facet to make this work on any platform.

I didn't seen anything in the documentation about that.

...

However there is still a limitation when working with 2 byte characters (ie char16_t or wchar_t under windows) as Boost.Locale would work correctly only with UCS-2

...

But this is actually C++ standard's limitation.

The reason I ask is that I often see things requested on the list which I think could be better implemented as codecvt facets. Also, it seemed to me that a large part of the iostreams library could have been implemented more efficiently wiht codecvt facets. Admitidly, it's somewhat unobvious how to make the best use of this - (needs another library and documentation of course) but I'm surprised that don't seem to be mentioned at all in the documentation. Robert Ramey

Artyom

8:06 a.m.

...

...
So Boost.Locale reimplements standard codecvt facet to make this work on any platform.

I didn't seen anything in the documentation about that.

http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#b6d767d2f8ed5d...

...

...
But this is actually C++ standard's limitation.

The reason I ask is that I often see things requested on the list which I think could be better implemented as codecvt facets. Also, it seemed to me that a large part of the iostreams library could have been implemented more efficiently wiht codecvt facets. Admitidly, it's somewhat unobvious how to make the best use of this - (needs another library and documentation of course)

Under the hood Boost.Locale uses uconv_* ICU API to do this. So you can convert any stateless character set. But this is actually implementation-details. From user point of view he needs just to imbue correct locale and miracle happens. But due to limitations of std::codecvt facet this implementation is far from being very efficient mostly because every implementation required to be stateless, which is quite unfortunate. Several additional points: 1. Codecvt facets are not actually as useful as they seems to be. For example: you can't convert utf-8 <-> ISO-8859-8 using codecvt without passing wide characters. 2. std::codecvt is was really bad designed and it does not provide any information about std::mbstate_t and how it works which makes it very hard to implement anything efficiently. And does not allow to store any information. So only standard library designed can provide something but not user that extends it. 3. For example all Boost code that implements UTF-8 codecvt facet supports only UCS-2 under windows and not UTF-16. I don't put too much lights on it as I don't think that codecvt facets should be widely as they very problematic. Best, Artyom

Gevorg Voskanyan

10:53 a.m.

Artyom wrote:

...

However there is still a limitation when working with 2 byte characters (ie char16_t or wchar_t under windows) as Boost.Locale would work correctly only with UCS-2

...

But this is actually C++ standard's limitation.

Artyom, I've always wondered about this, so will take this chance for clarification. Isn't this rather windows compilers' non-compliance? 3.9.1/5 "Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales (22.1.1). Type wchar_t shall have the same size, signedness, and alignment requirements (3.9) as one of the other integral types, called its underlying type." So _if_ any supported locale can handle characters outside of BMP, the implementation will qualify as non-conforming according to this paragraph. I know that windows OS itself can handle such characters, so I'd expect supported std locales to be able to handle them as well, but I haven't checked that. Any clarification on this matter would be appreciated. Thank you, Gevorg

Artyom

11:53 a.m.

...

Artyom,

So _if_ any supported locale can handle characters outside of BMP, the implementation will qualify as non-conforming according to this paragraph. I know that windows OS itself can handle such characters, so I'd expect supported std locales to be able to handle them as well, but I haven't checked that.

Any clarification on this matter would be appreciated.

I was talking about **codecvt facet only** i.e. conversion via imbue locale to file stream - and this is due to limitation of definition of codecvt facet thats it. Direct conversion functions like to_utf/from_utf has no this limitation. All Boost.Locale support wide characters and it supports characters outside of BMP for UTF-16 encoded strings. If it wasn't it was absolutely useless software. So don't worry... As matter of fact Boost.Locale supports: - narrow (normal) characters - char for 8 bits locale like ISO-8859-8. - narrow (normal) characters - char for variable length locale like UTF-8 or even Shift-JIS. - wide characters wchar-t for both UTF-16 (Windows) and UTF-32 (POSIX) encodings. - C++0x char16_t/char32_t for utf-16/utf-32 if available And of course the support is of all Unicode from 0 to 10FFFF. Everything is fully supported. Only issue that exists if codepage conversion via standard std::loclae::codecvt facet due to its limitation. And BTW I do not recommend use it widely as it has some other issues as well. Best, Artyom

Gevorg Voskanyan

12:36 p.m.

...

I was talking about **codecvt facet only** i.e. conversion via imbue locale to file stream - and this is due to limitation of definition of codecvt facet thats it.

What is the concrete limitation of codecvt specification that prevents creating a codecvt facet that converts UTF-16 to-from UTF-8? I just re-read 22.2.1.5 but wasn't able to see it.

...

Direct conversion functions like to_utf/from_utf has no this limitation.

All Boost.Locale support wide characters and it supports characters outside of BMP for UTF-16 encoded strings. If it wasn't it was absolutely useless software.

Good to hear. Yes, I agree it's very important. boost::detail::utf8_codecvt_facet fails that test, at least on windows, but I'm wondering what is the fundamental restriction it can't be patched to support them?

...

So don't worry...

As matter of fact Boost.Locale supports:

- narrow (normal) characters - char for 8 bits locale like ISO-8859-8. - narrow (normal) characters - char for variable length locale like UTF-8 or even Shift-JIS. - wide characters wchar-t for both UTF-16 (Windows) and UTF-32 (POSIX) encodings. - C++0x char16_t/char32_t for utf-16/utf-32 if available

For the original (non-compliance) point I raised it would be interesting to see how well codecvt< char32_t, char, std::mbstate_t > is going to be implemented under windows :) BTW, I see some interesting additions to codecvts in n3090, 22.5. Any plans to implement them in Boost.Locale?

...

And of course the support is of all Unicode from 0 to 10FFFF. Everything is fully supported.

Only issue that exists if codepage conversion via standard std::loclae::codecvt facet due to its limitation.

And BTW I do not recommend use it widely as it has some other issues as well.

Non-iterator interface is a real pain in using codecvt, I admit.

...

Best, Artyom

Best Regards, Gevorg

Artyom

1:06 p.m.

...

What is the concrete limitation of codecvt specification that prevents creating a codecvt facet that converts UTF-16 to-from UTF-8? I just re-read 22.2.1.5 but wasn't able to see it.

Good to hear. Yes, I agree it's very important. boost::detail::utf8_codecvt_facet fails that test, at least on windows, but I'm wondering what is the fundamental restriction it can't be patched to support them?

The standard defines (form my memory) following: - The conversion can be performed converting single wide character one-by-one - i.e. Implementation should work even if only one wide character is given (and BTW MSVC indeed converts one character in time) - There is absolutely no information given about std::mbstate_t that should save intermediate data between conversions so, there is actually no way to pass anything between sequential calls of std::locale::codecvt<...>::in/out. So even if I observe first surrogate pair there is no way to pass this information for next call and thus I loose this information This is exactly the reason you can't implement utf-8 - utf-16 codepage conversion using codecvt facet. On the other hand there is no such limitations for utf-32 encodings as there is no information to preserve between calls. Additional note: it is also not possible to convert statefull encodings like UTF-7 as there is no way to move state around. So generally std::locale::codecvt is not well designed to be derived from, so only way to to stream conversion correctly is redesign this facet, but in such case you can't use it with std::iostreams library.

...

For the original (non-compliance) point I raised it would be interesting to see how well codecvt< char32_t, char, std::mbstate_t > is going to be implemented under windows :)

There is no problem to implement it correctly.

...

BTW, I see some interesting additions to codecvts in n3090, 22.5. Any plans to implement them in Boost.Locale?

On same wave, when char32_t/char16_t would be available, hopefully these facets would be implemented. But today it is impossible to implement utf-16 codecvt facets. My personal opinion - avoid wide characters and any "Unicode" characters. Because it is best way to full yourself with "Unicode" support as in reality they do not provide any advantage over plain char and utf-8 encodings. So, unless you are using Win32 API avoid wide characters. However too many programmers would disagree with me, epsecially Windows programmers who grew on "Unicode" and "Wide" API. So Boost.Locale fully supports wide characters.

...

Non-iterator interface is a real pain in using codecvt, I admit.

I think best interface would be rather something like boost::iostreams filter but I think this should be rather part of iostreams library then localization. Also it should not pass wide encoding in the middle when converting utf-8 to ISO-8859-8. But that is different story. For simple string conversion boost::locale provides from_utf/to_utf that work correctly with utf-8/16/32. Artyom

Andrey Semashev

3:24 p.m.

On 05/24/2010 05:06 PM, Artyom wrote:

...

...
- There is absolutely no information given about std::mbstate_t that should save intermediate data between conversions so, there is actually no way to pass anything between sequential calls of std::locale::codecvt<...>::in/out. So even if I observe first surrogate pair there is no way to pass this information for next call and thus I loose this information

Well, that's not exactly true. mbstate_t is defined by the C standard, and indeed, it says pretty much nothing about its nature, except that it's not an array. But on any platform I worked with (including Windows) it's an integer. I think, it is perfectly fair to assume that it is at least a POD and sizeof(mbstate_t) >= 1, which makes it possible to store information about surrogate pairs in it. The C++ standard does give some hints regarding how the conversion state shall be handled by the stream. In particular, it specifies that the state will be value-initialized at the beginning of the conversion, and it will call `shift` at the end of the conversion in order to finalize the converted character sequence and return the state to its initial value. Not that it makes it easier to use mbstate_t with UCI under the hood, but it seems possible (theoretically, at least) to implement the complete UTF-16 <-> char conversion with it. PS: I don't pretend that I'd learned the standards by heart. All the references are off the top of my head. :)

Artyom

7:15 p.m.

...

Well, that's not exactly true. mbstate_t is defined by the C standard, and indeed, it says pretty much nothing about its nature, except that it's not an array. But on any platform I worked with (including Windows) it's an integer.

ֹUnder Linux it is structure and AFAIK gcc uses iconv for conversion. So I'm not sure how safe is to write anything to it.

...

I think, it is perfectly fair to assume that it is at least a POD and sizeof(mbstate_t) >= 1, which makes it possible to store information about surrogate pairs in it.

The C++ standard does give some hints regarding how the conversion state shall be handled by the stream. In particular, it specifies that the state will be value-initialized at the beginning of the conversion, and it will call `shift` at the end of the conversion in order to finalize the converted character sequence and return the state to its initial value.

Not that it makes it easier to use mbstate_t with UCI under the hood, but it seems possible (theoretically, at least) to implement the complete UTF-16 <-> char conversion with it.

I was thinking about it but unfortunately standard does not specify how mbstate_t initialized. If I could assume that it is at leaset POD filled with zeros I could do something but I actually can't. At least I didn't find any reference for this.

Andrey Semashev

25 May 25 May

5:12 p.m.

On 05/24/2010 11:15 PM, Artyom wrote:

...

...
Well, that's not exactly true. mbstate_t is defined by the C standard, and indeed, it says pretty much nothing about its nature, except that it's not an array. But on any platform I worked with (including Windows) it's an integer.

ֹUnder Linux it is structure and AFAIK gcc uses iconv for conversion.

So I'm not sure how safe is to write anything to it.

Ah, right. I forgot about Linux. But still it's POD and can hold an integral value. How it is used by the standard facet is not relevant as long as you don't interchange states between your facet and the standard one.

...

...
The C++ standard does give some hints regarding how the conversion state shall be handled by the stream. In particular, it specifies that the state will be value-initialized at the beginning of the conversion, and it will call `shift` at the end of the conversion in order to finalize the converted character sequence and return the state to its initial value.

I was thinking about it but unfortunately standard does not specify how mbstate_t initialized. If I could assume that it is at leaset POD filled with zeros I could do something but I actually can't.

It is POD since it's defined by the C standard.

...

At least I didn't find any reference for this.

The C standard describes that the zero-valued mbstate_t shall count as an initial state. From n1256: 7.24.6 Extended multibyte/wide character conversion utilities ... 3 The initial conversion state corresponds, for a conversion in either direction, to the beginning of a new multibyte character in the initial shift state. A zero-valued mbstate_t object is (at least) one way to describe an initial conversion state. A zero-valued mbstate_t object can be used to initiate conversion involving any multibyte character sequence, in any LC_CTYPE category setting. ... Also, there is the mbsinit function that allows to detect if the state has the initial value (just in case there are other initial values, other than zero-filled). Next, for do_in/do_out the C++ standard says (22.2.1.5.2): 1 Preconditions: [...] state initialized, if at the beginning of a sequence, or else equal to the result of converting the preceding characters in the sequence. and further on, in the paragraph 5 (regarding do_unshift), there is a footnote that explains that the method is intended to return the state to the initial value (typically, stateT()).

Artyom

6:20 p.m.

...

The C standard describes that the zero-valued mbstate_t shall count as an initial state. From n1256:

7.24.6 Extended multibyte/wide character conversion utilities

...

3 The initial conversion state corresponds, for a conversion in either direction, to the beginning of a new multibyte character in the initial shift state. A zero-valued mbstate_t object is (at least) one way to describe an initial conversion state. A zero-valued mbstate_t object can be used to initiate conversion involving any multibyte character sequence, in any LC_CTYPE category setting.

...

Ok that is interesting, it actually allows me to keep what I need if I know it is zeroed Thank you, I'll take a look on it.

...

Also, there is the mbsinit function that allows to detect if the state has the initial value (just in case there are other initial values, other than zero-filled).

I can't really relay on mbsinit as they not available in on CRTL's

...

and further on, in the paragraph 5 (regarding do_unshift), there is a footnote that explains that the method is intended to return the state to the initial value (typically, stateT()).

That is probably the most misleading point. If state is POD then stateT() does nothing! Thank your for your points. I'll take a look on what can be done. Artyom

Andrey Semashev

7:14 p.m.

On 25.05.2010 22:20, Artyom wrote:

...

...
Also, there is the mbsinit function that allows to detect if the state has the initial value (just in case there are other initial values, other than zero-filled).

I can't really relay on mbsinit as they not available in on CRTL's

This function is standard C. It should be present in any CRTL.

...

...
and further on, in the paragraph 5 (regarding do_unshift), there is a footnote that explains that the method is intended to return the state to the initial value (typically, stateT()).

That is probably the most misleading point.

If state is POD then stateT() does nothing!

It is called value-initialization and it zeroes the POD value.

Artyom

7:38 p.m.

...

...
...
intended to return the state to the initial value (typically, stateT()).

That is probably the most misleading point.

If state is POD then stateT() does nothing!

It is called value-initialization and it zeroes the POD value.

Thank you! I feel stupid. I hadn't knew about this C++ feature. Artyom

Gevorg Voskanyan

24 May 24 May

5:57 p.m.

Artyom wrote:

...

- There is absolutely no information given about std::mbstate_t that should save intermediate data between conversions so, there is actually no way to pass anything between sequential calls of std::locale::codecvt<...>::in/out. So even if I observe first surrogate pair there is no way to pass this information for next call and thus I loose this information

Ah, yes, mbstate_t. It may be good enough for UTF-8 (multibyte sequence) but may not be usable for UTF-16 (multi-wchar_t sequence :-) on windows). Thanks, that fully explains it.

...

This is exactly the reason you can't implement utf-8 - utf-16 codepage conversion using codecvt facet.

And still codecvt<char16_t, char, mbstate_t> converts between UTF-8 and UTF-16 in C++11. That seems to suggest the new standard will require mbstate_t to be usable for UTF-16 as well.

...

On the other hand there is no such limitations for utf-32 encodings as there is no information to preserve between calls.

Additional note: it is also not possible to convert statefull encodings like UTF-7 as there is no way to move state around.

So generally std::locale::codecvt is not well designed to be derived from, so only way to to stream conversion correctly is redesign this facet, but in such case you can't use it with std::iostreams library.

Yes, I see.

...

...
For the original (non-compliance) point I raised it would be interesting to see how well codecvt< char32_t, char, std::mbstate_t > is going to be implemented under windows :)

There is no problem to implement it correctly.

My point is that, if that is implemented correctly, then strictly speaking an implementation where sizeof(wchar_t) == 16 will become non-conforming according to 3.9.1/5. Which would be interesting to see :) As intended by the standard wchar_t should have at least 21 bits for C++ implementations supporting Unicode, but of course that isn't going to be fixed for windows compilers in the foreseeable future.

...

...
BTW, I see some interesting additions to codecvts in n3090, 22.5. Any plans to implement them in Boost.Locale?

On same wave, when char32_t/char16_t would be available, hopefully these facets would be implemented. But today it is impossible to implement utf-16 codecvt facets.

You're right, implementing them would require implementation-specific knowledge about std::mbstate_t.

...

My personal opinion - avoid wide characters and any "Unicode" characters. Because it is best way to full yourself with "Unicode" support as in reality they do not provide any advantage over plain char and utf-8 encodings.

So, unless you are using Win32 API avoid wide characters. However too many programmers would disagree with me, epsecially Windows programmers who grew on "Unicode" and "Wide" API. So Boost.Locale fully supports wide characters.

Despite having started as a Windows programmer myself, I don't disagree with you on this point. On the contrary, I've always been uncomfortable with windows' A/W API, and would've much preferred UTF-8 instead, as is the case in the *nix world. Another reason I am forced still to use wide characters is wxwidgets, which (in its 2.x releases) assumes ANSI unless wxUSE_UNICODE is defined to non-zero value, in which case it uses wide characters in its API, essentially following the windows model. Fortunately, this is going to change in soon-to-be-released wxwidgets 3.0, which will have UTF-8 interface.

...

...
Non-iterator interface is a real pain in using codecvt, I admit.

I think best interface would be rather something like boost::iostreams filter but I think this should be rather part of iostreams library then localization. Also it should not pass wide encoding in the middle when converting utf-8 to ISO-8859-8.

But that is different story.

For simple string conversion boost::locale provides from_utf/to_utf that work correctly with utf-8/16/32.

Looking forward to Boost.Locale review!

...

Artyom

Artyom, thank you very much for providing your insightful ideas satisfying my curiosity! Best Regards, Gevorg

Artyom

7:18 p.m.

...

My point is that, if that is implemented correctly, then strictly speaking an implementation where sizeof(wchar_t) == 16 will become non-conforming according to 3.9.1/5. Which would be interesting to see :)

No, it will not as Micorsoft would not agree. This is why C++0x gives us char16_t and char32_t.

...

As intended by the standard wchar_t should have at least 21 bits for C++ implementations supporting Unicode,

AFAIK actually wchar_t defined by C and it allowed even sizeof(wchar_t)==1

...

Looking forward to Boost.Locale review!

Me too :-) Artyom

Gevorg Voskanyan

8:08 p.m.

Artyom wrote:

...

No, it will not as Micorsoft would not agree. This is why C++0x gives us char16_t and char32_t.

Microsoft's disagreement can not make something more conforming than it actually is, IMO :) But I'm just nitpicking, and this point is not that relevant in practice, so let's put it off.

...

AFAIK actually wchar_t defined by C and it allowed even sizeof(wchar_t)==1

Sure it is allowed, but only as long as 1 byte is enough to encode all characters of supported locales: C99, 7.17 "wchar_t which is an integer type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales" Thanks, Gevorg

Adam Merz

25 May 25 May

11:02 p.m.

Artyom <artyomtnk <at> yahoo.com> writes:

...

This is exactly the reason you can't implement utf-8 - utf-16 codepage conversion using codecvt facet.

And yet MSVC10 comes with a (presumably standard-compliant) codecvt that does exactly that: http://msdn.microsoft.com/en-us/library/ee292142.aspx In fact, it comes with codecvts that convert to and from UCS-4, UCS-2, UTF-16 and UTF-8, UTF-16LE, UTF-16BE: http://msdn.microsoft.com/en-us/library/ee336489.aspx Are these possible only because of functionality added in C++0x?

Artyom

26 May 26 May

4:55 a.m.

--- On Wed, 5/26/10, Adam Merz <adammerz@hotmail.com> wrote:

...

From: Adam Merz

And yet MSVC10 comes with a (presumably standard-compliant) codecvt that does exactly that: http://msdn.microsoft.com/en-us/library/ee292142.aspx

In fact, it comes with codecvts that convert to and from UCS-4, UCS-2, UTF-16 and UTF-8, UTF-16LE, UTF-16BE: http://msdn.microsoft.com/en-us/library/ee336489.aspx

Are these possible only because of functionality added in C++0x?

It is because the compiler has knowledge on std::mbstate_t which I do not have. But according to latest discussions it looks like I'll be able to implement full UTF-16 support in codecvt Artyom

Ryo IGARASHI

3:10 a.m.

Hi, I like this library that this will ease me using I18n stuff. 2010/5/24 Artyom <artyomtnk@yahoo.com>:

...

As matter of fact Boost.Locale supports:

- narrow (normal) characters - char for 8 bits locale like ISO-8859-8. - narrow (normal) characters - char for variable length locale like UTF-8 or even Shift-JIS. - wide characters wchar-t for both UTF-16 (Windows) and UTF-32 (POSIX) encodings.

wchar_t may not be UTF-32 nor UCS-4 on POSIX system. (Solaris/NetBSD do not define __ISO_STDC10646__) See the following link for details why some think UCS-4 wchar_t is not enough: http://www.usenix.org/event/usenix01/freenix01/full_papers/hagino/hagino_htm... How about various stateful ISO-2022-* family? Is conversion to/from locale like ja_JP.ISO-2022-JP supported? -- Ryo IGARASHI, Ph.D. rigarash@gmail.com

Artyom

4:53 a.m.

...

How about various stateful ISO-2022-* family? Is conversion to/from locale like ja_JP.ISO-2022-JP supported?

Yes via to_utf<>(), from_utf<>() functions. But not via std::codecvt<>. As there is no way to preserve state. In any case, if your compiler's/OS codecvt<> facet supports such encodings then you can always use native codecvt. Artyom

Artyom

25 May 25 May

5:29 a.m.

Hello All, - Is there anybody to volunteer for review management? - When the library gets to: http://www.boost.org/community/review_schedule.html Thanks, Artyom

...

I want to request a formal review for Boost.Locale library.

Ronald Garcia

26 May 26 May

3:56 p.m.

Hi Artyom, I have received your request and have added your library to the review queue. Best, Ron On May 23, 2010, at 4:55 PM, Artyom wrote:

...

Hello,

I want to request a formal review for Boost.Locale library.

Short Description: ------------------

Boost.Locale is powerful localization library that provides powerful localization tool extending existing built-in C++ localization facilities in Unicode aware way.

Documentation: --------------

- Reference: http://cppcms.sourceforge.net/boost_locale/html/ index.html - Tutorial: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html

Sources: --------

- https://sourceforge.net/projects/cppcms/files/boost_locale/boost_locale_for_...

Tested Compilers: -----------------

- GCC 3.4, 4.1, 4.2, 4.3, 4.4 (including C++0x support of new Unicode characters) - MSVC 2008 - Intel 11.0 - Sun Studio 12 with STLPort

Tested Platforms: -----------------

- Linux 2.6 - FreeBSD 8.0 - OpenSolaris - Windows XP/SP2: incuding MSVC, MinGW and Cygwin

Updates from previous versions: -------------------------------

- Redesigned Context information support in message catalogs: switched to native gettext msgctxt support. - Cleanup of MSVC warnings. - MSVC8 fixes. - Build Cleanup.

Thank You,

Artyom Beilis

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

5540

Age (days ago)

5543

Last active (days ago)

List overview

Download

23 comments

7 participants

participants (7)

Adam Merz
Andrey Semashev
Artyom
Gevorg Voskanyan
Robert Ramey
Ronald Garcia
Ryo IGARASHI