Re: [boost] [Locale] Preview of 3rd version

11 Sep 2010

      Lars listed some reasons I was thinking about when asking for more
flexibility around the dictionary loading features, so I will try to make
myself more clear and specific.

After some thought I'll try to explain what I know I would need from a
localisation library, from my tiny experience with some at-job-home-made
tools (in video games we don't use a "standard" localization tool as most
available tools - as your already mentioned - assume we will use a classic
filesystem, and other unflexible problems not compatible with some game
engine architecture - even on desktop games - so this kind of tool is often
house-made) :

 A. Dictionary source
 B. Dictionary loading control
 C. Dictionary format

Those customization points shouldn't be directly related (orthogonal?).

A. Dictionary source :

Currently, if I my understanding is correct, the boost::locale library will
always assume that dictionary files are on the (standard?) filesystem.
As Lars and others already said, there are some cases (embedded and games)
where this is simply not the case, the game structure requiring getting data
from somewhere else, baked resource packs or network or RAM or somewhere
else.
I've seen some domain-specific libraries that are used in video-game
industry and other industries to provide a simple way to fix this without
managing all cases :
 1. They provide a way to load domain-specific data (in our case,
dictionaries) from any source by allowing the user to provide a custom "data
stream" class that the library will use to pull/read the data.
    For example OGRE (graphic rendering engine) allow loading textures and
models from anywhere by providing such a mechanism. Some people use it to
feed the engine with graphic textures and meshes from the network, having a
central server providing the resources to the clients (for some simulation
applications if I remember well).
 2. They provide "helper" functions that assume that the data are on the
file system, simplifying the use of dictionary files when no custom
data-source are required.

That make those libraries data-source-independent. In fact I just remembered
that all the libraries I used so far (for games and not-games) provide such
a way to plug any source of data to the library. That also let the user easy
ways to change the data source later if needed.

B. Dictionary loading control

This is about "when is a dictionary loaded in memory and usable without
having to process something first?".
If my understanding is correct, boost::locale will automatically load the
dictionary when needed? I guess it will load the dictionary when the
corresponding language/domain will be invoked?

Anyway, some ways to manually load and unload dictionaries (or dictionaries
related to a locale?) would help controlling the application
performance/flow. For example most games first load all "whole app life"
resources on startup, then will load "world-chunk-specific" resources each
time it need it and will unload those resources at some point without
exiting the whole app, just to free the memory for another world chunk.
After having load a world chunk, there wouldn't be any
allocation/deallocation because it would easily slow down the frame rate and
make it fluctuate in an unpleasant way.
In my own case I also have to manage user-made-modules that have some
localization informations that would be loaded when the module is used but
not if it isn't.
The module structure of my application and memory limitations makes
impossible to load all modules at startup, that would be too much and I
don't even know how much modules will be available some time after the
release. Manual control over when to load/unload what is required for my
current "big" game.

So some manual control on this side would be of great help.
Maybe some kind of strategy could be provided by the user?

C. Dictionary format

You already pointed the way to provide a custom format for dictionaries, so
this is good from my point of view. A lot of companies uses simple
excell/csv files to manipulate localization texts, making simple to provide
texts to translate to localization companies.
I would only criticize the "domain" thing but that's a gettext philisophy
thing (read farther).

So those points of customization would be necessary for almost all my own
projects (games or not). I understand that I might not be in the general
case - not sure about this. However I think all programming libraries should
be data-source agnostic at least.

That said, I have to say that I often searched for good alternatives to
gettext as it always seemed to me unadequate for my use (at work AND at
home) for other reasons than the previous ones :
 - the ids have to be strings (or am I wrong?) - having the user to provide
custom id would help to manage tools/performance on his side
 - it assume that the string id will have some context informations allowing
to know the right localization needed. It looks like a hack to me because I
think each unique text should have a unique id. That way you can have the
same english words with different ids, allowing to have different words in
another language. The domain string seem to be some hack to fix this case. I
would prefer some way to get a unique id from each text, provided by the
user. As boost::locale follow the gettext philosophy I don't see how it
would be possible to change this without changing the backend.

I'm not sure I'm clear about all of this so ask me if you don't understand
something. (sorry, my english isn't perfect)

The current boost::locale is already a great work that I'd like to use as
soon as I'm in a case suited for it's use. So I forgot to say :  good work
:)

About this
...
...
Actually one of the most widely deployed localization library - gettext
has
much harder and stricter restrictions, and yet Boost.Locale implements
all gettext gives and much more.
I hadn't seen too much complains about possibility to load dictionaries
from
gettext developers (unlike other issues)
...
Most libraries are not held to the same standard that Boost libraries are.
There are (good and sometime bad) reasons why almost all game-industry
 developers don't use (all) boost and gettext. However I'm making a game on
desktop that heavily uses boost and so far it was a really good idea - the
alternative was POCO. I'd like to help fighting against the often-wrong
belief that stl and boost are bad for games (and i'm not the only one it
seem :
http://gamedev.stackexchange.com/questions/268/stl-for-games-yea-or-nay  -
see the answer with the most points, not the checked one)

I planned to write my specific solution for my game's localisation, having a
somewhat complex user-provided-module-based structure, but if boost::locale
provide a solution for the points I've listed, then I can plug it in my game
without a problem and that will simplify a lot of things (assuming
performance is correct for my need). For the moment I'll keep following how
boost::locale goes until I reach the point where I need to make a final
decision.

Joël Lamotte

On Sat, Sep 11, 2010 at 21:22, Mathias Gaunard <mathias.gaunard@ens-lyon.org
...
wrote:
...
On 11/09/2010 19:30, Artyom wrote:
...
Yes, are you using the latest version 2.x from sourceforge
site or you had taken the "/trunk"? Because latest boost.locale
sits in its own branch - rework.
I didn't use your library.
What exactly did you seen? In what case? Do you save into file or into
...
std::wcout?
do_in gets called in the file to memory case.
I'm talking of a codecvt facet that converts UTF-8 in files to UTF-16 in
memory.
The behaviour I've observed is the following: the implementation of fstream
in MSVC9 seems to call 'in' char per char, calling again and appending one
character when partial is returned.
Then, in case of 'ok', it just reads the first wchar_t written on the
output, and ignores the second that would be written in the case of
surrogates.
But then, looking at your library, you seem to do some weird (and
dangerous!) reinterpret casting, which suggests you're not making the
fstream interface directly with a std::codecvt<wchar_t, char,
std::mbstate_t> facet.
How did you make that work?
Can you bring me the sample code that shows the issue?
...
Attached is a testcase that demonstrates the bug in MSVC9.
It prints "65 65 65 65 65" instead of "65 66 65 66 65 66 65 66 65 66".
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] [Locale] Preview of 3rd version

Klaim