Re: [boost] [Locale] Preview of 3rd version

13 Sep 2010


      On Sun, Sep 12, 2010 at 06:57, Artyom <artyomtnk@yahoo.com> wrote:
...
...
A.  Dictionary source :
Currently, if I my understanding is correct, the  boost::locale library
will
...
always assume that dictionary files are on the  (standard?) filesystem.
I would be easy to fix.
Great!
...
...
For example OGRE (graphic rendering engine)  allow loading textures and
models from anywhere by providing such a  mechanism.
Small notice, unlike textures and other game resources, the text itself
usually very small by its nature as it is only text. And usually games
(client side) are played in one language only so I really doubt that
they should be treated the same way that other resources. For example,
translations of almost all software on my Debian PC to Russian takes about
12MB and this is about 250 applications.
I agree on the principle but in practice that is really
application/game-relative. Some type of games rely on a lot of text that
have to be loaded only if really required and/or from external sources. Even
12Mo is huge in some cases. It's really a technical-budget thing.
Anyway I agree that it's not the most common case. MMOs and some extern
module based games/apps are "the exception" you could say and often require
large amount of memory to run anyway.
...
...
B. Dictionary loading control
This is about "when is a  dictionary loaded in memory and usable without
having to process something  first?".
If my understanding is correct, boost::locale will automatically  load
the
...
dictionary when needed? I guess it will load the dictionary when  the
corresponding language/domain will be invoked?
The dictionaries are loaded when locale is created. Usually locale
generation is not cheap, so you create it only once at startup and use it.
i.e. when you call
generator gen;
  gen.add_messages_path("/usr/share/locale");
  gen.add_messages_domain("my_cool_app");
  std::locale mine = gen("ru_RU.UTF-8")
all dictionaries for application my_cool_app for ru_RU locale are loaded.
...
Anyway, some ways  to manually load and unload dictionaries (or
dictionaries
...
related to a  locale?) would help controlling the application
performance/flow.
When you destroy locale the dictionary is destroyed.
Excellent.
...
...
For example  most games first load all "whole app life"
resources on startup, then will  load "world-chunk-specific" resources
each
time it need it and will unload  those resources at some point without
exiting the whole app,
You can bind world-chunk-specific translation strings to other domain and
load
it with locale,
and then destroy the locale, or cache it or... whatever you do.
If you want good performance of dictionaries loading, use UTF-8 locale and
UTF-8
dictionaries
then loading it would be as fast as pointing a memory chunk to specific
point.
Agreed, I'm already on this path. (After having read your(?) answer on
StackOverflow some months ago I banished wide strings and made all UTF-8
based)
...
...
The module structure of my application and memory limitations  makes
impossible to load all modules at startup, that would be too much and  I
don't even know how much modules will be available some time after  the
release. Manual control over when to load/unload what is required for  my
current "big" game.
So some manual control on this side would be of  great help.
Maybe some kind of strategy could be provided by the  user?
Just bind separate chunks to separate domains, but IMHO, just load all
dictionaries
for user locale at once. For example, all translation string for evolution
(the
biggest
dictionary I had found) take about 500K, it is about a size of one hi-res
texture.
Soo... such stuff should be taken in proportion.
You're still assuming here that graphics are the most expensive resource in
my example but it is not. I agree with the general advice, it's just not
practical in my specific case. (in fact it's the first time I'm in a case
where it's not good to load everything first...)
...
...
C. Dictionary format
You already pointed the way to provide  a custom format for dictionaries,
...
this is good from my point of view.
A  lot of companies uses simple
excell/csv files to manipulate localization  texts, making simple to
so
provide
...
texts to translate to localization  companies.
Ohhh dear God!
NEVER, NEVER do such things!
This is exactly the point why 99% of developers are aware of 1% of issues.
Example: plural forms
See:
<
http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#6f4922f4556816...
...
Translating text is much more complicated then taking one string and
matching
other.
Best - use tools like Lokalize, poedit or others, they do much better job
and much helpful for translators.
That is exactly why when it comes to localization, you should never give
developer
too much freedom as it would do crappy job. Always use a library written by
expert.
I fully agree.
I was just pointing that companies already using tools that their
not-technical not-translation-expert are used to (whatever the organisation
of an excell sheet) could use your library without having to have those
non-expert people still do their work without loosing time learning a new
tool.
Even some translation companies requires you to provide data in excell
files.

I'm in a position where I can choose whatever translation tools, I'll not
use excell files.
...
- the ids  have to be strings.
Ok... Yes they have to.
...
- having the user to provide
custom id  would help to manage tools/performance on his side
NEVER, NEVER, NEVER use non-string ids for localization.
Things like get_resource_string(HELLO_WORLD_CONSTANT)
is one of the most terrible solutions in the world that lead
to very hard development work and very bad localization results.
Translation should be done from "context + string" to "string" and never
through some kind of other ids.
I'm performance freak (see CppCMS), but I still think that using string is
fast
enough
and has much bigger advantage they few microseconds that you can gain.
You want performance, do profiling, and I doubt if you'll even find
translation
of string id as bottle neck.
That depends on the use and size of strings, but you're right for most
usage. I've worked on some hardware where it was not the case but I agree
it's not common.
...
...
- it assume that the  string id will have some context informations
allowing
to know the right  localization needed.
Yes
...
It looks like a hack to me because I
think each unique  text should have a unique id.
No! Human languages unlike programming are context dependent and ambiguous,
same string may be translated into two different strings depending on
context.
Small but clear example:
http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#1f0e3dad999083...
I was thinking about more cultural/language based example where there is not
only context that make the translation hard.
For example some expressions that exists in some languages don't exists in
others and just have equivalents that could be used in the given context.
Now as you point :
...
...
The  domain string seem to be some hack to fix this case.
You have a misconception - domains are actually application/module names.
So if "domain" are module names, how to differenciate two sentences that are
the same in a language with two different contextes, but are not the same in
an other language with the same different contextes? (I've seen cases like
that but I'm not an expert and I'll have too search for an example I
guess...)
...
From what I understand, I would have to add additional context informations
other than module name to each text?
...
For example, for Excel you would have "excel.mo" dictionary and the domain
is
"excel".
Rationale: usually all dictionaries from many applications kept in same
place
i.e.
/usr/share/locale/ru/LC_MESSAGES/excel.mo
 /usr/share/locale/ru/LC_MESSAGES/word.mo
...
would prefer some way  to get a unique id from each text, provided by the
user. As boost::locale  follow the gettext philosophy I don't see how it
would be possible to change  this without changing the backend.
You should not changes this, not for coding reason, but rather for
linguistics
reason and quality of language support.
Ok, I think you're the expert here so I'll follow your advice.
...
...
I planned  to write my specific solution for my game's localisation,
having a
...
somewhat  complex user-provided-module-based structure, but if
boost::locale
provide a  solution for the points I've listed, then I can plug it in my
game
without a  problem and that will simplify a lot of things (assuming
performance is  correct for my need). For the moment I'll keep following
how
boost::locale  goes until I reach the point where I need to make a  final
decision.
One additional point to remember:
localization is not only about translating strings,
 translating strings is only one, important but small part of it.
I'm aware of that and text translation is my last problem on this side
(thanks to the context of the game and some other libs that make displaying
any text easier) but it's always good to remember it, thanks.
...
Regards,
  Artyom
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] [Locale] Preview of 3rd version

Klaim