
On Sun, Sep 12, 2010 at 06:57, Artyom <artyomtnk@yahoo.com> wrote:
A. Dictionary source :
Currently, if I my understanding is correct, the boost::locale library
will
always assume that dictionary files are on the (standard?) filesystem.
I would be easy to fix.
Great!
For example OGRE (graphic rendering engine) allow loading textures and models from anywhere by providing such a mechanism.
Small notice, unlike textures and other game resources, the text itself usually very small by its nature as it is only text. And usually games (client side) are played in one language only so I really doubt that they should be treated the same way that other resources. For example, translations of almost all software on my Debian PC to Russian takes about 12MB and this is about 250 applications.
I agree on the principle but in practice that is really application/game-relative. Some type of games rely on a lot of text that have to be loaded only if really required and/or from external sources. Even 12Mo is huge in some cases. It's really a technical-budget thing. Anyway I agree that it's not the most common case. MMOs and some extern module based games/apps are "the exception" you could say and often require large amount of memory to run anyway.
B. Dictionary loading control
This is about "when is a dictionary loaded in memory and usable without having to process something first?". If my understanding is correct, boost::locale will automatically load
the
dictionary when needed? I guess it will load the dictionary when the corresponding language/domain will be invoked?
The dictionaries are loaded when locale is created. Usually locale generation is not cheap, so you create it only once at startup and use it.
i.e. when you call
generator gen; gen.add_messages_path("/usr/share/locale"); gen.add_messages_domain("my_cool_app"); std::locale mine = gen("ru_RU.UTF-8")
all dictionaries for application my_cool_app for ru_RU locale are loaded.
Anyway, some ways to manually load and unload dictionaries (or
dictionaries
related to a locale?) would help controlling the application performance/flow.
When you destroy locale the dictionary is destroyed.
Excellent.
For example most games first load all "whole app life" resources on startup, then will load "world-chunk-specific" resources each time it need it and will unload those resources at some point without exiting the whole app,
You can bind world-chunk-specific translation strings to other domain and load it with locale, and then destroy the locale, or cache it or... whatever you do.
If you want good performance of dictionaries loading, use UTF-8 locale and UTF-8 dictionaries then loading it would be as fast as pointing a memory chunk to specific point.
Agreed, I'm already on this path. (After having read your(?) answer on StackOverflow some months ago I banished wide strings and made all UTF-8 based)
The module structure of my application and memory limitations makes impossible to load all modules at startup, that would be too much and I don't even know how much modules will be available some time after the release. Manual control over when to load/unload what is required for my current "big" game.
So some manual control on this side would be of great help. Maybe some kind of strategy could be provided by the user?
Just bind separate chunks to separate domains, but IMHO, just load all dictionaries for user locale at once. For example, all translation string for evolution (the biggest dictionary I had found) take about 500K, it is about a size of one hi-res texture.
Soo... such stuff should be taken in proportion.
You're still assuming here that graphics are the most expensive resource in my example but it is not. I agree with the general advice, it's just not practical in my specific case. (in fact it's the first time I'm in a case where it's not good to load everything first...)
C. Dictionary format
You already pointed the way to provide a custom format for dictionaries,
this is good from my point of view. A lot of companies uses simple excell/csv files to manipulate localization texts, making simple to
so provide
texts to translate to localization companies.
Ohhh dear God! NEVER, NEVER do such things!
This is exactly the point why 99% of developers are aware of 1% of issues.
Example: plural forms
See: < http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#6f4922f4556816...
Translating text is much more complicated then taking one string and matching other.
Best - use tools like Lokalize, poedit or others, they do much better job and much helpful for translators.
That is exactly why when it comes to localization, you should never give developer too much freedom as it would do crappy job. Always use a library written by expert.
I fully agree. I was just pointing that companies already using tools that their not-technical not-translation-expert are used to (whatever the organisation of an excell sheet) could use your library without having to have those non-expert people still do their work without loosing time learning a new tool. Even some translation companies requires you to provide data in excell files. I'm in a position where I can choose whatever translation tools, I'll not use excell files.
- the ids have to be strings.
Ok... Yes they have to.
- having the user to provide custom id would help to manage tools/performance on his side
NEVER, NEVER, NEVER use non-string ids for localization.
Things like get_resource_string(HELLO_WORLD_CONSTANT) is one of the most terrible solutions in the world that lead to very hard development work and very bad localization results.
Translation should be done from "context + string" to "string" and never through some kind of other ids.
I'm performance freak (see CppCMS), but I still think that using string is fast enough and has much bigger advantage they few microseconds that you can gain.
You want performance, do profiling, and I doubt if you'll even find translation of string id as bottle neck.
That depends on the use and size of strings, but you're right for most usage. I've worked on some hardware where it was not the case but I agree it's not common.
- it assume that the string id will have some context informations allowing to know the right localization needed.
Yes
It looks like a hack to me because I think each unique text should have a unique id.
No! Human languages unlike programming are context dependent and ambiguous, same string may be translated into two different strings depending on context.
Small but clear example:
http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#1f0e3dad999083...
I was thinking about more cultural/language based example where there is not only context that make the translation hard. For example some expressions that exists in some languages don't exists in others and just have equivalents that could be used in the given context. Now as you point :
The domain string seem to be some hack to fix this case.
You have a misconception - domains are actually application/module names.
So if "domain" are module names, how to differenciate two sentences that are the same in a language with two different contextes, but are not the same in an other language with the same different contextes? (I've seen cases like that but I'm not an expert and I'll have too search for an example I guess...)
From what I understand, I would have to add additional context informations other than module name to each text?
For example, for Excel you would have "excel.mo" dictionary and the domain is "excel".
Rationale: usually all dictionaries from many applications kept in same place i.e.
/usr/share/locale/ru/LC_MESSAGES/excel.mo /usr/share/locale/ru/LC_MESSAGES/word.mo
would prefer some way to get a unique id from each text, provided by the user. As boost::locale follow the gettext philosophy I don't see how it would be possible to change this without changing the backend.
You should not changes this, not for coding reason, but rather for linguistics reason and quality of language support.
Ok, I think you're the expert here so I'll follow your advice.
I planned to write my specific solution for my game's localisation,
having a
somewhat complex user-provided-module-based structure, but if boost::locale provide a solution for the points I've listed, then I can plug it in my game without a problem and that will simplify a lot of things (assuming performance is correct for my need). For the moment I'll keep following how boost::locale goes until I reach the point where I need to make a final decision.
One additional point to remember:
localization is not only about translating strings, translating strings is only one, important but small part of it.
I'm aware of that and text translation is my last problem on this side (thanks to the context of the game and some other libs that make displaying any text easier) but it's always good to remember it, thanks.
Regards, Artyom
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost