[Locale] Preview of 3rd version

Artyom

10 Sep 2010 10 Sep '10

10:19 a.m.

Hello All, Announcement ============ I want to announce a preview of the third version of Boost.Locale: Tutorial: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html Reference: http://cppcms.sourceforge.net/boost_locale/html/index.html Downloads: https://sourceforge.net/projects/cppcms/files/boost_locale/ There are following significant changes: - Implemented multiple localization backends: - icu - the default and recommended backend, based on ICU library - std - based on C++ standard library localizations support, - posix - based on POSIX 2008 API (newlocale, strftime_l,...) - winapi - based on Windows API functions - Significantly simplified locale generation. - Improvements in UTF-8 handling by ICU where possible - Thread safety fixes when using ICU library - Fixed std::codecvt facet support to handle UTF-16 instead of UCS-2 only. - Removed support of compilers missing wide character support, gcc-3.4 on windows is not supported any more, latest gcc-4.x required with support of wide streams and strings, for example gcc-4.5 Tested Platforms: - Compilers: GCC (3.4, 4.2, 4.3, 4.5, 4.5/c++0x), Intel 11.0, MSVC 2008, SunStudio/stlport - Operating Systems: Linux, FreeBSD, OpenSolaris, Windows XP, Cygwin 1.7, (TODO Mac OS X) - ICU version: 3.6 to 4.4 Request To Site managers ========================= Please, update download link Review ====== Is there anybody who wants to volunteer to be a review manager? Artyom

Show replies by date

Klaim

10 Sep 10 Sep

1:46 p.m.

Hi, I'm reading the tutorial and I noticed this line : This technique was adopted by Boost.Locale library in order to provide powerful and correct localization. However instead of using standard and very limited standard library C++ facets it created its own facets that use ICU under the hood in order to make much powerful. As now there are several possible backends, it might be good to change this sentence? Joël Lamotte. On Fri, Sep 10, 2010 at 12:19, Artyom <artyomtnk@yahoo.com> wrote:

...

Hello All,

Announcement ============

I want to announce a preview of the third version of Boost.Locale:

Tutorial: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html Reference: http://cppcms.sourceforge.net/boost_locale/html/index.html Downloads: https://sourceforge.net/projects/cppcms/files/boost_locale/

There are following significant changes:

- Implemented multiple localization backends: - icu - the default and recommended backend, based on ICU library - std - based on C++ standard library localizations support, - posix - based on POSIX 2008 API (newlocale, strftime_l,...) - winapi - based on Windows API functions - Significantly simplified locale generation. - Improvements in UTF-8 handling by ICU where possible - Thread safety fixes when using ICU library - Fixed std::codecvt facet support to handle UTF-16 instead of UCS-2 only. - Removed support of compilers missing wide character support, gcc-3.4 on windows is not supported any more, latest gcc-4.x required with support of wide streams and strings, for example gcc-4.5

Tested Platforms:

- Compilers: GCC (3.4, 4.2, 4.3, 4.5, 4.5/c++0x), Intel 11.0, MSVC 2008, SunStudio/stlport - Operating Systems: Linux, FreeBSD, OpenSolaris, Windows XP, Cygwin 1.7, (TODO Mac OS X) - ICU version: 3.6 to 4.4

Request To Site managers =========================

Please, update download link

Review ======

Is there anybody who wants to volunteer to be a review manager?

Artyom

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Klaim

3:52 p.m.

Hi again, I've finished reading the documentation and I just have a question : what if I want to inject manually the content of the translation/dictionary files in the generator instead of using the add path functions? Looking at the interface of boost::locale::generator class I don't see a way to do this other than having a "real filesystem" set of dictionary files. For example, let's say that in my project I'm using some kind of file-system virtualization library like PhysicsFS( http://icculus.org/physfs/ ). My translation files are zipped together (with other files) and this library makes it easy to manipulate them without having to bother about compression. I explicitely dont' want to use real filesystem and I can only read the content of files using the PhyscsFS file reading function, getting raw bytes. In this case I need a way to inject those raw bytes inside boost::locale as dictionary files content. Is there already a way to do this or do you recommand another way to manage this kind of case? Using some kind of real filesystem cache directory would work I guess but it's far from ideal. Joël Lamotte On Fri, Sep 10, 2010 at 15:46, Klaim <mjklaim@gmail.com> wrote:

...

Hi,

I'm reading the tutorial and I noticed this line :

This technique was adopted by Boost.Locale library in order to provide powerful and correct localization. However instead of using standard and very limited standard library C++ facets it created its own facets that use ICU under the hood in order to make much powerful.

As now there are several possible backends, it might be good to change this sentence?

Joël Lamotte.

On Fri, Sep 10, 2010 at 12:19, Artyom <artyomtnk@yahoo.com> wrote:

...
Hello All,

Announcement ============

I want to announce a preview of the third version of Boost.Locale:

Tutorial: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html Reference: http://cppcms.sourceforge.net/boost_locale/html/index.html Downloads: https://sourceforge.net/projects/cppcms/files/boost_locale/

There are following significant changes:

- Implemented multiple localization backends: - icu - the default and recommended backend, based on ICU library - std - based on C++ standard library localizations support, - posix - based on POSIX 2008 API (newlocale, strftime_l,...) - winapi - based on Windows API functions - Significantly simplified locale generation. - Improvements in UTF-8 handling by ICU where possible - Thread safety fixes when using ICU library - Fixed std::codecvt facet support to handle UTF-16 instead of UCS-2 only. - Removed support of compilers missing wide character support, gcc-3.4 on windows is not supported any more, latest gcc-4.x required with support of wide streams and strings, for example gcc-4.5

Tested Platforms:

- Compilers: GCC (3.4, 4.2, 4.3, 4.5, 4.5/c++0x), Intel 11.0, MSVC 2008, SunStudio/stlport - Operating Systems: Linux, FreeBSD, OpenSolaris, Windows XP, Cygwin 1.7, (TODO Mac OS X) - ICU version: 3.6 to 4.4

Request To Site managers =========================

Please, update download link

Review ======

Is there anybody who wants to volunteer to be a review manager?

Artyom

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Artyom

6:49 p.m.

...

Hi again,

Hello,

...

I've finished reading the documentation and I just have a question : what if I want to inject manually the content of the translation/dictionary files in the generator instead of using the add path functions?

How do you exactly what to inject such translations? Boost.Locale dictionaries are actually GNU Gettext compiled dictionaries. What do you expect to provide to boost.locale interface. Don't forget that it is not that simple - just give a dictionary, there may be a set of fall-backs, for example if you have locale en_GB.UTF-8 but you don't have dictionary under /en_GB/LC_MESSAGES/foo.mo it would try to load them from /en/LC_MESSAGES/foo.mo so I don't exactly understand how do you expect to do this manually. Also it does not load all dictionaries at once, it only loads thous required for specific locale and not for all locales. So basically you ask for for "file-system" plugin that allows you to see anything as file system... I don't think that is fully reasonable. However if you want to have your own specific dictionaries in specific format, all you need is to implement this facet: http://cppcms.sourceforge.net/boost_locale/html/classboost_1_1locale_1_1mess... And install it into generated locale. All functions like translate, gettext, dgettext would work for you.

...

Is there already a way to do this or do you recommand another way to manage this kind of case? Using some kind of real filesystem cache directory would work I guess but it's far from ideal.

Boost.Locale uses gettext model, and I don't think you will find any other localization system that does not requires loading some files from file system. In your case, you may use temporary directory or just implement your own facet that translates such messages. Artyom

Cory Nelson

11 Sep 11 Sep

6:14 a.m.

On Fri, Sep 10, 2010 at 11:49 AM, Artyom <artyomtnk@yahoo.com> wrote:

...

...
Hi again,

Hello,

...
I've finished reading the documentation and I just have a question : what if I want to inject manually the content of the translation/dictionary files in the generator instead of using the add path functions?

How do you exactly what to inject such translations? Boost.Locale dictionaries are actually GNU Gettext compiled dictionaries.

What do you expect to provide to boost.locale interface.

Don't forget that it is not that simple - just give a dictionary, there may be a set of fall-backs, for example if you have locale en_GB.UTF-8 but you don't have dictionary under /en_GB/LC_MESSAGES/foo.mo it would try to load them from /en/LC_MESSAGES/foo.mo so I don't exactly understand how do you expect to do this manually. Also it does not load all dictionaries at once, it only loads thous required for specific locale and not for all locales.

So basically you ask for for "file-system" plugin that allows you to see anything as file system... I don't think that is fully reasonable.

Is it really not going to be easy to make this policy-based? I think it's a valid concern. There is a lot of software that packs files together inside a zip or another container, for better or worse. -- Cory Nelson http://int64.org

Artyom

6:21 a.m.

...

There is a lot of software that packs files together inside a zip or another container, for better or worse.

But it also become unzipped before it runs... You can't run exe from zip archive, it always previously unzipped to some temporary location. Same all dictionary files can be unzipped into some temporary location. Artyom

Cory Nelson

9:35 a.m.

On Fri, Sep 10, 2010 at 11:21 PM, Artyom <artyomtnk@yahoo.com> wrote:

...

...
There is a lot of software that packs files together inside a zip or another container, for better or worse.

But it also become unzipped before it runs... You can't run exe from zip archive, it always previously unzipped to some temporary location.

I'm sorry, I didn't mean as a distribution format, but as the primary data store used by the application. Some apps (it sounds like Joël's is one) keep their assets in a .zip or some other container, separate from their executables. One good example would be in games, where such virtual filesystems are quite popular as a way of efficiently grouping files and portably reducing I/O and fragmentation. Perhaps you could let users supply a functor open_file(filename), and they can choose to open the path however they want, from whatever source. The default functor would use what you currently have. Would this be prohibitively difficult? -- Cory Nelson http://int64.org

Artyom

11:59 a.m.

...

Perhaps you could let users supply a functor open_file(filename), and they can choose to open the path however they want, from whatever source. The default functor would use what you currently have. Would this be prohibitively difficult?

Yes, it is doable. However I don't think it is reasonable, I hadn't seen any localization system that does this. Putting dictionaries in file system and using them is something common, also it does not have any performance impacts as dictionaries should be loaded only once into the memory, unlike other game resources like video or images where fragmentation or access time may have effect. In any case, I would say, you can always re-implement your own boost::locale::message_format facet to deal with such non-standard situation. Artyom

Lars Viklund

5:16 p.m.

On Sat, Sep 11, 2010 at 04:59:23AM -0700, Artyom wrote:

...

...
Perhaps you could let users supply a functor open_file(filename), and they can choose to open the path however they want, from whatever source. The default functor would use what you currently have. Would this be prohibitively difficult?

Yes, it is doable.

However I don't think it is reasonable, I hadn't seen any localization system that does this. Putting dictionaries in file system and using them is something common, also it does not have any performance impacts as dictionaries should be loaded only once into the memory, unlike other game resources like video or images where fragmentation or access time may have effect.

There are many situations where it is unfeasible or even impossible to write to or even read from the filesystem, nor have more than a small set of files available, if any at all. Consider these use cases: * a plugin component for some existing application in the form of a DLL/so, where you then would store localization information inside your executable image; * an ActiveX component for the web, where you have significantly restricted permissions and probably fall under the above case, but with more restrictions; * a game or other application where your resources are baked into virtual filesystem images for efficiency and organizational purposes; * a video game console, where you may have a filesystem to work with, or non-standard interfaces to work with one; * any kind of application where you may retrieve the resources from external sources, like other parts of the filesystem or from a remote host. The last case may sound a bit remote, but consider the case where you have an application installed into a priviledged location like Program Files or /usr. Assume that an end user wants to add new localizations (which happens in real life) and he might not have write permissions into the application directory. A solution to that is to have him put his files in a directory he has permission to, typically in his home directory, from which the localization can override lookups into.

...

In any case, I would say, you can always re-implement your own boost::locale::message_format facet to deal with such non-standard situation.

Judging by the name of that type and it's apparent purpose, it sounds like quite a strange place to be hacking in file handling when it should be properly extendable. The counter-argument that you need directory information for traversal sounds like something that should be easily accomodatable in a proper extension point. As far as I can see to support your fallbacks, all you need to have is the ability to query and enumerate entities in a hierarchy and retrieve istreams to them. For such an apparently rich and useful library, I find this restriction rather strange. -- Lars Viklund | zao@acc.umu.se

Artyom

6:20 p.m.

...

There are many situations where it is unfeasible or even impossible to write to or even read from the filesystem, nor have more than a small set of files available, if any at all.

Consider these use cases: * a plugin component for some existing application in the form of a DLL/so, where you then would store localization information inside your executable image;

There lots of such situations in real world applications that use gnu-gettext, you just add specific dictionary for specific domain and install it together with shared object.

...

* an ActiveX component for the web, where you have significantly restricted permissions and probably fall under the above case, but with more restrictions;

1. ActiveX should die... but this is other story ;-) 2. You would anyway probably would not be able to use ICU in ActiveX, but probably this case I can agree that file system may be a strong restriction.

...

* a game or other application where your resources are baked into virtual filesystem images for efficiency and organizational purposes;

I've mentioned that efficiency is not an issue

...

* a video game console, where you may have a filesystem to work with, or non-standard interfaces to work with one;

You would not even have ICU on such platform not talking about any reasonable localization support from C++ or even C library. So boost.locale would be quite useless. So I don't really buy this.

...

* any kind of application where you may retrieve the resources from external sources, like other parts of the filesystem or from a remote host.

Translation dictionaries are part of program installation as well as icons, tables or 1001 other non-code related resources, so just come with them.

...

The last case may sound a bit remote, but consider the case where you have an application installed into a priviledged location like Program Files or /usr. Assume that an end user wants to add new localizations (which happens in real life) and he might not have write permissions into the application directory. A solution to that is to have him put his files in a directory he has permission to, typically in his home directory, from which the localization can override lookups into.

This case works perfectly in Boost.Locale case, if application just adds a path to location where user stores his dictionaries he can easily add them. There more then one path where the application may look for the dictionaries. Program developer can always add: $(HOME)/share/locale And /usr/share/locale And user would be able easily to add any other dictionaries on his own.

...

...
In any case, I would say, you can always re-implement your own boost::locale::message_format facet to deal with such non-standard situation.

Judging by the name of that type and it's apparent purpose, it sounds like quite a strange place to be hacking in file handling when it should be properly extendable.

Not exactly, boost locale relates on following: gettext, translate - > boost::locale::message_format facet boost::locale::generator -> localization_backend -> create gettext implementation of this facet In fact non-gettext dictionaries can be added in future and application would not notice this. So adding a special functional for retrieving text files is just breaks all the abstraction between user and localization backend.

...

The counter-argument that you need directory information for traversal sounds like something that should be easily accomodatable in a proper extension point. As far as I can see to support your fallbacks, all you need to have is the ability to query and enumerate entities in a hierarchy and retrieve istreams to them.

Simplest thing I can give a user an additional member like: boost::function<FILE *(std::string const &)> open_file There: http://cppcms.svn.sourceforge.net/viewvc/cppcms/boost_locale/tags/v2.91/boost/locale/gnu_gettext.hpp?revision=1425&view=markup And let him load files from his non-standard file system and install its own message facet by calling create_messages_facet. But somebody would complain that you need to open a file for this... and I have **very good** reasons to use FILE * and not fstream. 1. Windows - wide path: a) need to support UTF-8 paths (so need to convert them to utf-16 and then call _wfopen) b) I can't use std::ifstream as it does not support wide paths and I don't want add dependency on other boost library c) I need to support both ANSI and Wide API under windows but I want to keep one type of path and not eject wide strings where I don't need them. 2. Memory mapping I expect at some point to load dictionaries for UTF-8 charsets via memory mapping, so it would not waste memory for each loaded dictionary by each application. And you can get file descriptor only use FILE interface and not fstream This is why I keep this as internal feature, and not give it to user.

...

For such an apparently rich and useful library, I find this restriction rather strange.

As you can see the notice above it is not a simple feature to load a file under modern OS like Windows that makes our lives much more... challenging (or miserable) When I build this library I try to build useful localization library that allows to do the common tasks in right way. I hadn't tried to implement all possible requirements for all potential users. When it comes to localization, there are much bigger problems then how to load particular file, the biggest problem is that 99% of developers are aware of 1% of potential problems then can met in this area. Actually one of the most widely deployed localization library - gettext has much harder and stricter restrictions, and yet Boost.Locale implements all gettext gives and much more. I hadn't seen too much complains about possibility to load dictionaries from gettext developers (unlike other issues) Bottom line ----------- If it is so important I can add this. But believe me, I wish it would be the biggest issue in localization area - how to load dictionaries from virtual and non-standard file system :-) Artyom

Cory Nelson

8:14 p.m.

On Sat, Sep 11, 2010 at 11:20 AM, Artyom <artyomtnk@yahoo.com> wrote:

...

...
* an ActiveX component for the web, where you have significantly restricted permissions and probably fall under the above case, but with more restrictions;

1. ActiveX should die... but this is other story ;-) 2. You would anyway probably would not be able to use ICU in ActiveX, but probably this case I can agree that file system may be a strong restriction.

A similar example is Google Chrome. Because it sandboxes processes, they needed to write a custom virtual filesystem for SQLite that routes file opening to another process.

...

...
* a game or other application where your resources are baked into virtual filesystem images for efficiency and organizational purposes;

I've mentioned that efficiency is not an issue

Presumptuous.

...

...
* a video game console, where you may have a filesystem to work with, or non-standard interfaces to work with one;

You would not even have ICU on such platform not talking about any reasonable localization support from C++ or even C library.

So boost.locale would be quite useless. So I don't really buy this.

Again, presumptuous.

...

But somebody would complain that you need to open a file for this... and I have **very good** reasons to use FILE * and not fstream.

1. Windows - wide path:

a) need to support UTF-8 paths (so need to convert them to utf-16 and then call _wfopen) b) I can't use std::ifstream as it does not support wide paths and I don't want add dependency on other boost library c) I need to support both ANSI and Wide API under windows but I want to keep one type of path and not eject wide strings where I don't need them.

I'm not sure about other implementations, but VC++ supports wide paths in fstream since 2008.

...

2. Memory mapping

I expect at some point to load dictionaries for UTF-8 charsets via memory mapping, so it would not waste memory for each loaded dictionary by each application.

Maybe add another function<pair<shared_ptr<char const>,size_t>> then.

...

Actually one of the most widely deployed localization library - gettext has much harder and stricter restrictions, and yet Boost.Locale implements all gettext gives and much more.

I hadn't seen too much complains about possibility to load dictionaries from gettext developers (unlike other issues)

Most libraries are not held to the same standard that Boost libraries are.

...

If it is so important I can add this. But believe me, I wish it would be the biggest issue in localization area - how to load dictionaries from virtual and non-standard file system :-)

I think anyone who's dealt with l10n wishes this! Thank you. -- Cory Nelson http://int64.org

Artyom

12 Sep 12 Sep

4:15 a.m.

...

I'm not sure about other implementations, but VC++ supports wide paths in fstream since 2008.

MSVC only, in fast standard does not provide wide path. Bottom line, seems that I'll add non-standard fs support. Artyom

Artyom

10 Sep 10 Sep

6:32 p.m.

...

Hi,

I'm reading the tutorial and I noticed this line :

This technique was adopted by Boost.Locale library in order to provide powerful and correct localization. However instead of using standard and very limited standard library C++ facets it created its own facets that use ICU under the hood in order to make much powerful.

As now there are several possible backends, it might be good to change this sentence?

Probably, but still Boost.Locale's primary and recommended backend is ICU, because it not only does the job, it is also the one that does the job right. Now, if you do not need all the power of ICU, you may use other backends not based on ICU, but still do much more then standard C++ libraries std::locale and its facets do. Also note, first version of Boost.Locale hadn't supported backends at all and required ICU, this changed in this version. Artyom

Mathias Gaunard

11 Sep 11 Sep

6:10 p.m.

On 10/09/2010 11:19, Artyom wrote:

...

- Fixed std::codecvt facet support to handle UTF-16 instead of UCS-2 only.

Did you test that? I have a feeling it won't work with MSVC correctly from a glance at the code. From the tests I did, it seems MSVC only reads the first wchar_t in the output of do_in.

Artyom

6:30 p.m.

Yes, are you using the latest version 2.x from sourceforge site or you had taken the "/trunk"? Because latest boost.locale sits in its own branch - rework.

...

From the tests I did, it seems MSVC only reads the first wchar_t in the output of do_in.

Yes I did, I saved surrogate pairs to and from file and used various their combinations. What exactly did you seen? In what case? Do you save into file or into std::wcout? Can you bring me the sample code that shows the issue? Artyom

Mathias Gaunard

7:22 p.m.

On 11/09/2010 19:30, Artyom wrote:

...

Yes, are you using the latest version 2.x from sourceforge site or you had taken the "/trunk"? Because latest boost.locale sits in its own branch - rework.

I didn't use your library.

...

What exactly did you seen? In what case? Do you save into file or into std::wcout?

do_in gets called in the file to memory case. I'm talking of a codecvt facet that converts UTF-8 in files to UTF-16 in memory. The behaviour I've observed is the following: the implementation of fstream in MSVC9 seems to call 'in' char per char, calling again and appending one character when partial is returned. Then, in case of 'ok', it just reads the first wchar_t written on the output, and ignores the second that would be written in the case of surrogates. But then, looking at your library, you seem to do some weird (and dangerous!) reinterpret casting, which suggests you're not making the fstream interface directly with a std::codecvt<wchar_t, char, std::mbstate_t> facet. How did you make that work?

...

Can you bring me the sample code that shows the issue?

Attached is a testcase that demonstrates the bug in MSVC9. It prints "65 65 65 65 65" instead of "65 66 65 66 65 66 65 66 65 66".

Artyom

7:34 p.m.

Hello,

...

...
Yes, are you using the latest version 2.x from sourceforge site or you had taken the "/trunk"? Because latest boost.locale sits in its own branch - rework.

I didn't use your library.

Ahh I see, I do following: When I read for example 4 byes of UTF-8 that go to codepoint > 0xFFFF I do following: 1. I write first surrogate pair to output stream, I update the state to reflect that first part of the pair was written and **I do not consume input** 2. Same 4 utf-8 bytes again and see that state is marked to that first part of pair was written so I write the second and consume the input. So actually do_in called twice for same input.

...

But then, looking at your library, you seem to do some weird (and dangerous!) reinterpret casting,

which suggests you're not making the fstream interface directly with a std::codecvt<wchar_t, char, std::mbstate_t> facet.

Actually the mbstate_t is POD type that should be initialized to 0. I must make sure that sizeof(mbstate_t) >= 2, and then I use it as temporary storage for state. So it is fine to use it as storage for state. The biggest problem is that standard says nothing about mbstate_t but a fact that this is POD and initialized to 0, that is what I use. Artyom

Mathias Gaunard

7:57 p.m.

On 11/09/2010 20:34, Artyom wrote:

...

Ahh I see, I do following:

When I read for example 4 byes of UTF-8 that go to codepoint> 0xFFFF I do following:

1. I write first surrogate pair to output stream, I update the state to reflect that first part of the pair was written and **I do not consume input** 2. Same 4 utf-8 bytes again and see that state is marked to that first part of pair was written so I write the second and consume the input.

So actually do_in called twice for same input.

The code in question is in loop that keeps on going until from reaches from_end or the conversion fails (due to insufficient input or otherwise), so both surrogates should be written in the same do_in invocation.

...

Actually the mbstate_t is POD type that should be initialized to 0. I must make sure that sizeof(mbstate_t)>= 2, and then I use it as temporary storage for state.

I'm not talking about that, I meant the reinterpret casting between uchar and uint_type, but actually I suppose they're the same, maybe just different signedness, so that should be somewhat ok. It's still not allowed by the strict aliasing rules though.

Artyom

8:10 p.m.

...

The code in question is in loop that keeps on going until from reaches from_end or the conversion fails (due to insufficient input or otherwise), so both surrogates should be written in the same do_in invocation.

Take a look on my code, I don't update from_next till I write both surrogates, just sometimes I write them separately as usually uto + 1 == uto_end in MSVC implementation. So technically your code is wrong as you write into position of uto_end

...

...
Actually the mbstate_t is POD type that should be initialized to 0. I must make sure that sizeof(mbstate_t)>= 2, and then I use it as temporary storage for state.

I'm not talking about that, I meant the reinterpret casting between uchar and uint_type, but actually I suppose they're the same, maybe just different signedness, so that should be somewhat ok. It's still not allowed by the strict aliasing rules though.

I use uint16_t cast when sizeof(wchar_t) == 2 and uint32_t cast when sizeof of wchar_t == 4, I don't see any problem with this. So I always cast to pointer of virtually same binary type. Artyom

Mathias Gaunard

12 Sep 12 Sep

1:02 a.m.

On 11/09/2010 21:10, Artyom wrote:

...

Take a look on my code, I don't update from_next till I write both surrogates, just sometimes I write them separately as usually uto + 1 == uto_end in MSVC implementation.

Woah, I wonder how I didn't catch that one before! I never really checked for the size of uto because my converters do not support checking the size of the ouput since that's normally a waste of time. But then it can't use max_length because that's for the other way around. I really wonder why they didn't make codecvt symmetric.

...

So technically your code is wrong as you write into position of uto_end

Indeed, thanks for elucidating that mystery for me.

...

I use uint16_t cast when sizeof(wchar_t) == 2 and uint32_t cast when sizeof of wchar_t == 4, I don't see any problem with this.

According to the C++ standard, it is illegal for pointers of different types to reference the same memory location, with a few exceptions: - char* may safely alias any object - signedness and cv qualifications don't count as a different type with regards to that rule. Unfortunately, wchar_t is a different type from uint16_t or uint32_t, so what you are doing probably counts as breaking the strict aliasing rule.

Artyom

10:32 a.m.

...

Unfortunately, wchar_t is a different type from uint16_t or uint32_t, so what you are doing probably counts as breaking the strict aliasing rule.

C++0x char16_t and char32_t are define as

...

Types _Char16_t and _Char32_t denote distinct types with the same size, signedness,

and alignment as uint_least16_t and uint_least32_t, respectively, in <stdint.h>, called the underlying types.

So no problems there, Now, wchar_t is very weakly defined so it can be any size, but de-facto it is 2 or 4 bytes integer value. So it is fine to cast it to other value of such type, all other possible sizes are not supported by Boost.Locale, and it would not even compile if it has different size. So I don't see any possible issues there even on the paper it may not work. That is my least problem, there are much bigger problems like crappy support of locales by most C++ standard libraries. Artyom

Mathias Gaunard

12:19 p.m.

On 12/09/2010 11:32, Artyom wrote:

...

...
Unfortunately, wchar_t is a different type from uint16_t or uint32_t, so what you are doing probably counts as breaking the strict aliasing rule.

C++0x char16_t and char32_t are define as

...
Types _Char16_t and _Char32_t denote distinct types with the same size, signedness,

That's not how it is in C++0x. char16_t and char32_t are directly keywords, which has the bad effect or preventing you from defining types with such names.

...

...
and alignment as uint_least16_t and uint_least32_t, respectively, in<stdint.h>, called the underlying types.

So no problems there,

Yes there are. It has nothing to do with size and alignment. It's different types, so the compiler is allowed to assume the memory doesn't alias so that it can do smart optimizations. If you compile your code with -Wstrict-aliasing with GCC, you will get a warning that says so.

Artyom

1:13 p.m.

...

On 12/09/2010 11:32, Artyom wrote:

...
...
Unfortunately, wchar_t is a different type from uint16_t or uint32_t, so what you are doing probably counts as breaking the strict aliasing rule.

C++0x char16_t and char32_t are define as

...
Types _Char16_t and _Char32_t denote distinct types with the same size, signedness,

That's not how it is in C++0x. char16_t and char32_t are directly keywords, which has the bad effect or preventing you from defining types with such names.

...
...
and alignment as uint_least16_t and uint_least32_t, respectively, in<stdint.h>, called the underlying types.

So no problems there,

Yes there are. It has nothing to do with size and alignment. It's different types, so the compiler is allowed to assume the memory doesn't alias so that it can do smart optimizations.

If you compile your code with -Wstrict-aliasing with GCC, you will get a warning that says so.

1st it don't warns, But anyway there is not problem as I don't relay on strict aliasing as I convert pointers to other ones with compatible types and then forward them to appropriate function. I don't do stuff like this: wchar_t buf[2] = { a, b }; uint16_t *ptr = buf; foo(ptr); return buf[0]; // which may be dangerous I cast these pointers and to apropriate uintXX_t type and use only them: ptr[0]; // ok Without this casts the code would be much bigger and complicated. So when you use them with care, no problems should be. These casts use me to detach from original types and go other well defined types with known signess and size. Thanks for the point, I wasn't fully aware of this. Artyom

Mathias Gaunard

2:47 p.m.

On 12/09/2010 14:13, Artyom wrote:

...

...
On 12/09/2010 11:32, Artyom wrote:

...
...
Unfortunately, wchar_t is a different type from uint16_t or uint32_t, so what you are doing probably counts as breaking the strict aliasing rule.

C++0x char16_t and char32_t are define as

...
Types _Char16_t and _Char32_t denote distinct types with the same size, signedness,

That's not how it is in C++0x. char16_t and char32_t are directly keywords, which has the bad effect or preventing you from defining types with such names.

...
...
and alignment as uint_least16_t and uint_least32_t, respectively, in<stdint.h>, called the underlying types.

So no problems there,

Yes there are. It has nothing to do with size and alignment. It's different types, so the compiler is allowed to assume the memory doesn't alias so that it can do smart optimizations.

If you compile your code with -Wstrict-aliasing with GCC, you will get a warning that says so.

1st it don't warns,

I forgot to mention it also requires strict aliasing to be enabled, which is the case with -O3. I do get the warning with the following code: int main() { wchar_t data[] = L"foo"; unsigned int* foo = (unsigned int*)data; *foo = 0; } (on Linux x86, where unsigned int and wchar_t are the same size and alignment)

Klaim

11 Sep 11 Sep

8:55 p.m.

Lars listed some reasons I was thinking about when asking for more flexibility around the dictionary loading features, so I will try to make myself more clear and specific. After some thought I'll try to explain what I know I would need from a localisation library, from my tiny experience with some at-job-home-made tools (in video games we don't use a "standard" localization tool as most available tools - as your already mentioned - assume we will use a classic filesystem, and other unflexible problems not compatible with some game engine architecture - even on desktop games - so this kind of tool is often house-made) : A. Dictionary source B. Dictionary loading control C. Dictionary format Those customization points shouldn't be directly related (orthogonal?). A. Dictionary source : Currently, if I my understanding is correct, the boost::locale library will always assume that dictionary files are on the (standard?) filesystem. As Lars and others already said, there are some cases (embedded and games) where this is simply not the case, the game structure requiring getting data from somewhere else, baked resource packs or network or RAM or somewhere else. I've seen some domain-specific libraries that are used in video-game industry and other industries to provide a simple way to fix this without managing all cases : 1. They provide a way to load domain-specific data (in our case, dictionaries) from any source by allowing the user to provide a custom "data stream" class that the library will use to pull/read the data. For example OGRE (graphic rendering engine) allow loading textures and models from anywhere by providing such a mechanism. Some people use it to feed the engine with graphic textures and meshes from the network, having a central server providing the resources to the clients (for some simulation applications if I remember well). 2. They provide "helper" functions that assume that the data are on the file system, simplifying the use of dictionary files when no custom data-source are required. That make those libraries data-source-independent. In fact I just remembered that all the libraries I used so far (for games and not-games) provide such a way to plug any source of data to the library. That also let the user easy ways to change the data source later if needed. B. Dictionary loading control This is about "when is a dictionary loaded in memory and usable without having to process something first?". If my understanding is correct, boost::locale will automatically load the dictionary when needed? I guess it will load the dictionary when the corresponding language/domain will be invoked? Anyway, some ways to manually load and unload dictionaries (or dictionaries related to a locale?) would help controlling the application performance/flow. For example most games first load all "whole app life" resources on startup, then will load "world-chunk-specific" resources each time it need it and will unload those resources at some point without exiting the whole app, just to free the memory for another world chunk. After having load a world chunk, there wouldn't be any allocation/deallocation because it would easily slow down the frame rate and make it fluctuate in an unpleasant way. In my own case I also have to manage user-made-modules that have some localization informations that would be loaded when the module is used but not if it isn't. The module structure of my application and memory limitations makes impossible to load all modules at startup, that would be too much and I don't even know how much modules will be available some time after the release. Manual control over when to load/unload what is required for my current "big" game. So some manual control on this side would be of great help. Maybe some kind of strategy could be provided by the user? C. Dictionary format You already pointed the way to provide a custom format for dictionaries, so this is good from my point of view. A lot of companies uses simple excell/csv files to manipulate localization texts, making simple to provide texts to translate to localization companies. I would only criticize the "domain" thing but that's a gettext philisophy thing (read farther). So those points of customization would be necessary for almost all my own projects (games or not). I understand that I might not be in the general case - not sure about this. However I think all programming libraries should be data-source agnostic at least. That said, I have to say that I often searched for good alternatives to gettext as it always seemed to me unadequate for my use (at work AND at home) for other reasons than the previous ones : - the ids have to be strings (or am I wrong?) - having the user to provide custom id would help to manage tools/performance on his side - it assume that the string id will have some context informations allowing to know the right localization needed. It looks like a hack to me because I think each unique text should have a unique id. That way you can have the same english words with different ids, allowing to have different words in another language. The domain string seem to be some hack to fix this case. I would prefer some way to get a unique id from each text, provided by the user. As boost::locale follow the gettext philosophy I don't see how it would be possible to change this without changing the backend. I'm not sure I'm clear about all of this so ask me if you don't understand something. (sorry, my english isn't perfect) The current boost::locale is already a great work that I'd like to use as soon as I'm in a case suited for it's use. So I forgot to say : good work :) About this

...

...
Actually one of the most widely deployed localization library - gettext has much harder and stricter restrictions, and yet Boost.Locale implements all gettext gives and much more.

I hadn't seen too much complains about possibility to load dictionaries from gettext developers (unlike other issues)

...

Most libraries are not held to the same standard that Boost libraries are.

There are (good and sometime bad) reasons why almost all game-industry developers don't use (all) boost and gettext. However I'm making a game on desktop that heavily uses boost and so far it was a really good idea - the alternative was POCO. I'd like to help fighting against the often-wrong belief that stl and boost are bad for games (and i'm not the only one it seem : http://gamedev.stackexchange.com/questions/268/stl-for-games-yea-or-nay - see the answer with the most points, not the checked one) I planned to write my specific solution for my game's localisation, having a somewhat complex user-provided-module-based structure, but if boost::locale provide a solution for the points I've listed, then I can plug it in my game without a problem and that will simplify a lot of things (assuming performance is correct for my need). For the moment I'll keep following how boost::locale goes until I reach the point where I need to make a final decision. Joël Lamotte On Sat, Sep 11, 2010 at 21:22, Mathias Gaunard <mathias.gaunard@ens-lyon.org

...

wrote:

...

On 11/09/2010 19:30, Artyom wrote:

...
Yes, are you using the latest version 2.x from sourceforge site or you had taken the "/trunk"? Because latest boost.locale sits in its own branch - rework.

I didn't use your library.

What exactly did you seen? In what case? Do you save into file or into

...
std::wcout?

do_in gets called in the file to memory case. I'm talking of a codecvt facet that converts UTF-8 in files to UTF-16 in memory.

The behaviour I've observed is the following: the implementation of fstream in MSVC9 seems to call 'in' char per char, calling again and appending one character when partial is returned.

Then, in case of 'ok', it just reads the first wchar_t written on the output, and ignores the second that would be written in the case of surrogates.

But then, looking at your library, you seem to do some weird (and dangerous!) reinterpret casting, which suggests you're not making the fstream interface directly with a std::codecvt<wchar_t, char, std::mbstate_t> facet. How did you make that work?

Can you bring me the sample code that shows the issue?

...
Attached is a testcase that demonstrates the bug in MSVC9. It prints "65 65 65 65 65" instead of "65 66 65 66 65 66 65 66 65 66".

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Artyom

12 Sep 12 Sep

4:57 a.m.

...

A. Dictionary source :

Currently, if I my understanding is correct, the boost::locale library will always assume that dictionary files are on the (standard?) filesystem.

I would be easy to fix.

...

For example OGRE (graphic rendering engine) allow loading textures and models from anywhere by providing such a mechanism.

Small notice, unlike textures and other game resources, the text itself usually very small by its nature as it is only text. And usually games (client side) are played in one language only so I really doubt that they should be treated the same way that other resources. For example, translations of almost all software on my Debian PC to Russian takes about 12MB and this is about 250 applications.

...

B. Dictionary loading control

This is about "when is a dictionary loaded in memory and usable without having to process something first?". If my understanding is correct, boost::locale will automatically load the dictionary when needed? I guess it will load the dictionary when the corresponding language/domain will be invoked?

The dictionaries are loaded when locale is created. Usually locale generation is not cheap, so you create it only once at startup and use it. i.e. when you call generator gen; gen.add_messages_path("/usr/share/locale"); gen.add_messages_domain("my_cool_app"); std::locale mine = gen("ru_RU.UTF-8") all dictionaries for application my_cool_app for ru_RU locale are loaded.

...

Anyway, some ways to manually load and unload dictionaries (or dictionaries related to a locale?) would help controlling the application performance/flow.

When you destroy locale the dictionary is destroyed.

...

For example most games first load all "whole app life" resources on startup, then will load "world-chunk-specific" resources each time it need it and will unload those resources at some point without exiting the whole app,

You can bind world-chunk-specific translation strings to other domain and load it with locale, and then destroy the locale, or cache it or... whatever you do. If you want good performance of dictionaries loading, use UTF-8 locale and UTF-8 dictionaries then loading it would be as fast as pointing a memory chunk to specific point.

...

The module structure of my application and memory limitations makes impossible to load all modules at startup, that would be too much and I don't even know how much modules will be available some time after the release. Manual control over when to load/unload what is required for my current "big" game.

So some manual control on this side would be of great help. Maybe some kind of strategy could be provided by the user?

Just bind separate chunks to separate domains, but IMHO, just load all dictionaries for user locale at once. For example, all translation string for evolution (the biggest dictionary I had found) take about 500K, it is about a size of one hi-res texture. Soo... such stuff should be taken in proportion.

...

C. Dictionary format

You already pointed the way to provide a custom format for dictionaries, so this is good from my point of view. A lot of companies uses simple excell/csv files to manipulate localization texts, making simple to provide texts to translate to localization companies.

Ohhh dear God! NEVER, NEVER do such things! This is exactly the point why 99% of developers are aware of 1% of issues. Example: plural forms See: <http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#6f4922f45568161a8cdf4ad2299f6d23> Translating text is much more complicated then taking one string and matching other. Best - use tools like Lokalize, poedit or others, they do much better job and much helpful for translators. That is exactly why when it comes to localization, you should never give developer too much freedom as it would do crappy job. Always use a library written by expert.

...

- the ids have to be strings.

Ok... Yes they have to.

...

- having the user to provide custom id would help to manage tools/performance on his side

NEVER, NEVER, NEVER use non-string ids for localization. Things like get_resource_string(HELLO_WORLD_CONSTANT) is one of the most terrible solutions in the world that lead to very hard development work and very bad localization results. Translation should be done from "context + string" to "string" and never through some kind of other ids. I'm performance freak (see CppCMS), but I still think that using string is fast enough and has much bigger advantage they few microseconds that you can gain. You want performance, do profiling, and I doubt if you'll even find translation of string id as bottle neck.

...

- it assume that the string id will have some context informations allowing to know the right localization needed.

Yes

...

It looks like a hack to me because I think each unique text should have a unique id.

No! Human languages unlike programming are context dependent and ambiguous, same string may be translated into two different strings depending on context. Small but clear example: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#1f0e3dad999083...

...

The domain string seem to be some hack to fix this case.

You have a misconception - domains are actually application/module names. For example, for Excel you would have "excel.mo" dictionary and the domain is "excel". Rationale: usually all dictionaries from many applications kept in same place i.e. /usr/share/locale/ru/LC_MESSAGES/excel.mo /usr/share/locale/ru/LC_MESSAGES/word.mo

...

would prefer some way to get a unique id from each text, provided by the user. As boost::locale follow the gettext philosophy I don't see how it would be possible to change this without changing the backend.

You should not changes this, not for coding reason, but rather for linguistics reason and quality of language support.

...

I planned to write my specific solution for my game's localisation, having a somewhat complex user-provided-module-based structure, but if boost::locale provide a solution for the points I've listed, then I can plug it in my game without a problem and that will simplify a lot of things (assuming performance is correct for my need). For the moment I'll keep following how boost::locale goes until I reach the point where I need to make a final decision.

One additional point to remember: localization is not only about translating strings, translating strings is only one, important but small part of it. Regards, Artyom

Klaim

13 Sep 13 Sep

7:53 a.m.

On Sun, Sep 12, 2010 at 06:57, Artyom <artyomtnk@yahoo.com> wrote:

...

...
A. Dictionary source :

Currently, if I my understanding is correct, the boost::locale library

will

...
always assume that dictionary files are on the (standard?) filesystem.

I would be easy to fix.

Great!

...

...
For example OGRE (graphic rendering engine) allow loading textures and models from anywhere by providing such a mechanism.

Small notice, unlike textures and other game resources, the text itself usually very small by its nature as it is only text. And usually games (client side) are played in one language only so I really doubt that they should be treated the same way that other resources. For example, translations of almost all software on my Debian PC to Russian takes about 12MB and this is about 250 applications.

I agree on the principle but in practice that is really application/game-relative. Some type of games rely on a lot of text that have to be loaded only if really required and/or from external sources. Even 12Mo is huge in some cases. It's really a technical-budget thing. Anyway I agree that it's not the most common case. MMOs and some extern module based games/apps are "the exception" you could say and often require large amount of memory to run anyway.

...

...
B. Dictionary loading control

This is about "when is a dictionary loaded in memory and usable without having to process something first?". If my understanding is correct, boost::locale will automatically load

the

...
dictionary when needed? I guess it will load the dictionary when the corresponding language/domain will be invoked?

The dictionaries are loaded when locale is created. Usually locale generation is not cheap, so you create it only once at startup and use it.

i.e. when you call

generator gen; gen.add_messages_path("/usr/share/locale"); gen.add_messages_domain("my_cool_app"); std::locale mine = gen("ru_RU.UTF-8")

all dictionaries for application my_cool_app for ru_RU locale are loaded.

...
Anyway, some ways to manually load and unload dictionaries (or

dictionaries

...
related to a locale?) would help controlling the application performance/flow.

When you destroy locale the dictionary is destroyed.

Excellent.

...

...
For example most games first load all "whole app life" resources on startup, then will load "world-chunk-specific" resources each time it need it and will unload those resources at some point without exiting the whole app,

You can bind world-chunk-specific translation strings to other domain and load it with locale, and then destroy the locale, or cache it or... whatever you do.

If you want good performance of dictionaries loading, use UTF-8 locale and UTF-8 dictionaries then loading it would be as fast as pointing a memory chunk to specific point.

Agreed, I'm already on this path. (After having read your(?) answer on StackOverflow some months ago I banished wide strings and made all UTF-8 based)

...

...
The module structure of my application and memory limitations makes impossible to load all modules at startup, that would be too much and I don't even know how much modules will be available some time after the release. Manual control over when to load/unload what is required for my current "big" game.

So some manual control on this side would be of great help. Maybe some kind of strategy could be provided by the user?

Just bind separate chunks to separate domains, but IMHO, just load all dictionaries for user locale at once. For example, all translation string for evolution (the biggest dictionary I had found) take about 500K, it is about a size of one hi-res texture.

Soo... such stuff should be taken in proportion.

You're still assuming here that graphics are the most expensive resource in my example but it is not. I agree with the general advice, it's just not practical in my specific case. (in fact it's the first time I'm in a case where it's not good to load everything first...)

...

...
C. Dictionary format

You already pointed the way to provide a custom format for dictionaries,

...
this is good from my point of view. A lot of companies uses simple excell/csv files to manipulate localization texts, making simple to

so provide

...
texts to translate to localization companies.

Ohhh dear God! NEVER, NEVER do such things!

This is exactly the point why 99% of developers are aware of 1% of issues.

Example: plural forms

See: < http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#6f4922f4556816...

...
Translating text is much more complicated then taking one string and matching other.

Best - use tools like Lokalize, poedit or others, they do much better job and much helpful for translators.

That is exactly why when it comes to localization, you should never give developer too much freedom as it would do crappy job. Always use a library written by expert.

I fully agree. I was just pointing that companies already using tools that their not-technical not-translation-expert are used to (whatever the organisation of an excell sheet) could use your library without having to have those non-expert people still do their work without loosing time learning a new tool. Even some translation companies requires you to provide data in excell files. I'm in a position where I can choose whatever translation tools, I'll not use excell files.

...

- the ids have to be strings.

Ok... Yes they have to.

...
- having the user to provide custom id would help to manage tools/performance on his side

NEVER, NEVER, NEVER use non-string ids for localization.

Things like get_resource_string(HELLO_WORLD_CONSTANT) is one of the most terrible solutions in the world that lead to very hard development work and very bad localization results.

Translation should be done from "context + string" to "string" and never through some kind of other ids.

I'm performance freak (see CppCMS), but I still think that using string is fast enough and has much bigger advantage they few microseconds that you can gain.

You want performance, do profiling, and I doubt if you'll even find translation of string id as bottle neck.

That depends on the use and size of strings, but you're right for most usage. I've worked on some hardware where it was not the case but I agree it's not common.

...

...
- it assume that the string id will have some context informations allowing to know the right localization needed.

Yes

...
It looks like a hack to me because I think each unique text should have a unique id.

No! Human languages unlike programming are context dependent and ambiguous, same string may be translated into two different strings depending on context.

Small but clear example:

http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#1f0e3dad999083...

I was thinking about more cultural/language based example where there is not only context that make the translation hard. For example some expressions that exists in some languages don't exists in others and just have equivalents that could be used in the given context. Now as you point :

...

...
The domain string seem to be some hack to fix this case.

You have a misconception - domains are actually application/module names.

So if "domain" are module names, how to differenciate two sentences that are the same in a language with two different contextes, but are not the same in an other language with the same different contextes? (I've seen cases like that but I'm not an expert and I'll have too search for an example I guess...)

...

From what I understand, I would have to add additional context informations other than module name to each text?

...

For example, for Excel you would have "excel.mo" dictionary and the domain is "excel".

Rationale: usually all dictionaries from many applications kept in same place i.e.

/usr/share/locale/ru/LC_MESSAGES/excel.mo /usr/share/locale/ru/LC_MESSAGES/word.mo

...
would prefer some way to get a unique id from each text, provided by the user. As boost::locale follow the gettext philosophy I don't see how it would be possible to change this without changing the backend.

You should not changes this, not for coding reason, but rather for linguistics reason and quality of language support.

Ok, I think you're the expert here so I'll follow your advice.

...

...
I planned to write my specific solution for my game's localisation,

having a

...
somewhat complex user-provided-module-based structure, but if boost::locale provide a solution for the points I've listed, then I can plug it in my game without a problem and that will simplify a lot of things (assuming performance is correct for my need). For the moment I'll keep following how boost::locale goes until I reach the point where I need to make a final decision.

One additional point to remember:

localization is not only about translating strings, translating strings is only one, important but small part of it.

I'm aware of that and text translation is my last problem on this side (thanks to the context of the game and some other libs that make displaying any text easier) but it's always good to remember it, thanks.

...

Regards, Artyom

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Artyom

9:01 a.m.

...

...
...
A. Dictionary source :

Currently, if I my understanding is correct, the boost::locale library

will

...
always assume that dictionary files are on the (standard?) filesystem.

I would be easy to fix.

Great!

But as small note, you will have to install dictionaries manually and not via generator interface. But rather via this interface: <http://cppcms.sourceforge.net/boost_locale/html/gnu__gettext_8hpp-source.html> I mean you'll need to initialize the messages info structure and provide a callback (that would be member of messages_info) boost::function<bool(std::string file_name,std::vector<char> &file)> custom_fs_reader; And then install the catalogs using: std::locale new_locale = std::locale(generated,create_messages_facet<char>(my_message_info)); I don't want to add this into boost::locale::generator, as I don't think this is generally correct thing to do (as it would require to implement much more complex path).

...

Agreed, I'm already on this path. (After having read your(?) answer on StackOverflow some months ago I banished wide strings and made all UTF-8 based)

Actually it was my question, but I fully agree with the answer :-) In fact, I did not really want to implement wide strings support for Boost.Locale, but I'm afraid that without it, Boost.Locale would not pass the review. Also for windows development using UTF-16 might be quite justified to simplify the interaction with Win32 API.

...

...
No! Human languages unlike programming are context dependent and ambiguous, same string may be translated into two different strings depending on context.

Small but clear example:

http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#1f0e3dad999083...

...
I was thinking about more cultural/language based example where there is not only context that make the translation hard. For example some expressions that exists in some languages don't exists in others and just have equivalents that could be used in the given context.

Yes the example is not best.

...

So if "domain" are module names, how to differenciate two sentences that are the same in a language with two different contextes, but are not the same in an other language with the same different contextes?

I don't really understand the question. But maybe this would make it clear: When you translate a message it is uniquely defined by 4 parameters: - locale (for example "ru_RU") - domain (for example "excel") - context(for example "File Menu") - id(for example "Open...") So: cout << translate("File Menu","Open...") << translate("Internet Connection","Open...") << translate("Open...") << translate("File Menu","Close") << translate("Internet Connection","Close") << translate("Close") Require 6 different entries in the dictionary Artyom

Klaim

9:19 a.m.

But as small note, you will have to install dictionaries manually

...

and not via generator interface.

But rather via this interface:

< http://cppcms.sourceforge.net/boost_locale/html/gnu__gettext_8hpp-source.htm...

...
I mean you'll need to initialize the messages info structure and provide a callback (that would be member of messages_info)

boost::function<bool(std::string file_name,std::vector<char> &file)> custom_fs_reader;

And then install the catalogs using:

std::locale new_locale = std::locale(generated,create_messages_facet<char>(my_message_info));

I don't want to add this into boost::locale::generator, as I don't think this is generally correct thing to do (as it would require to implement much more complex path).

I'll try this solution and give feedback later then.

...

...
So if "domain" are module names, how to differenciate two sentences that are the same in a language with two different contextes, but are not the same in an other language with the same different contextes?

I don't really understand the question. But maybe this would make it clear:

When you translate a message it is uniquely defined by 4 parameters:

- locale (for example "ru_RU") - domain (for example "excel") - context(for example "File Menu") - id(for example "Open...")

So:

cout << translate("File Menu","Open...") << translate("Internet Connection","Open...") << translate("Open...") << translate("File Menu","Close") << translate("Internet Connection","Close") << translate("Close")

Require 6 different entries in the dictionary

In this example you're only using context parameters (where there is two parametters) right? I think I mixed up context parameters and domain parameter then, yes.

Artyom

9:49 a.m.

...

...
When you translate a message it is uniquely defined by 4 parameters:

- locale (for example "ru_RU") - domain (for example "excel") - context(for example "File Menu") - id(for example "Open...")

So:

cout << translate("File Menu","Open...") << translate("Internet Connection","Open...") << translate("Open...") << translate("File Menu","Close") << translate("Internet Connection","Close") << translate("Close")

Require 6 different entries in the dictionary

In this example you're only using context parameters (where there is two parametters) right?

Locale is defined by imbuing it to the stream. i.e. std::cout.imbue(my_generator("ru_RU.UTF-8")); Now default domain is used, if you want to switch domain you use: cout << as::domain("excel") << translate("File Menu","Open...") And the string would be taken from, for example /usr/share/locale/ru_RU/LC_MESSAGES/foo.mo dictionary (of course if the dictionary for foo was loaded before when you defined domains) Of course you can specify all 4 parameters as: std::locale ru_RU_locale = my_generator("ru_RU.UTF-8"); std::string = translate("File Menu","Open...").str<char>(ru_RU_locale,"excel"); Or little bit simpler in gettext style: std::string translated_open = dpgettext("excel","File Menu","Open...",ru_RU_locale); But... Usually you don't want to :-) Artyom

Lars Viklund

11 Sep 11 Sep

6:56 p.m.

On Fri, Sep 10, 2010 at 03:19:22AM -0700, Artyom wrote:

...

Tutorial: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html Reference: http://cppcms.sourceforge.net/boost_locale/html/index.html Downloads: https://sourceforge.net/projects/cppcms/files/boost_locale/

Why does this not use Boost.Build? I'd expect that a library that's submitted for review would be buildable using the current Boost build system. I do not see this addressed in either the documentation nor in the KNOWN_ISSUES file. Are you hoping that by the time this gets to review, Ryppl has taken over the world? -- Lars Viklund | zao@acc.umu.se

Artyom

7:10 p.m.

...

Why does this not use Boost.Build? I'd expect that a library that's submitted for review would be buildable using the current Boost build system.

To be honest at the point I submitted the first version for review BBv2 was totally incapable of doing something useful like finding a library or header in a way normal BBv2 user can do this without rewriting entry BBv2 system. Now the library became even "more complicated": it requires additional library iconv, that may be part of libc or may be external library... such checks AFAIK are far beyond abilities of basic BBv2 user to implement (not mentioning total lack of basic documentation) Now Vladimir works on next version and improves it, so hopefully when it gets to inclusion to Boost, boost build files will be ready, meanwhile, just use CMake... as if I was waiting for BBv2 being ready I would not submit this library at all. Finally I asked on Boost-Build list for somebody volunteer and help me with BB scripts. See: http://thread.gmane.org/gmane.comp.lib.boost.build/23089 So I hope somebody would help me or BB would be ready to normal user who thinks that documentation and basic checks existing in all build systems are probably useful thing to have.

...

I do not see this addressed in either the documentation nor in the KNOWN_ISSUES file.

I mentioned this in the tutorial in build instructions. Artyom

5431

Age (days ago)

5434

Last active (days ago)

List overview

Download

31 comments

5 participants

participants (5)

Artyom
Cory Nelson
Klaim
Lars Viklund
Mathias Gaunard