
Rogier van Dalen wrote:
[...] We thought about using multi_index but (correct me if I'm wrong) wouldn't that make is neccessary to fill the database at run-time? With serialization, the overhead would probably not be that bad, but still.
I think so, but I think hashing might give an enormous runtime performance gain. I'm not particularly knowledgeable in this area, just throwing in ideas on this.
Hashing would certainly be faster that the binary search we are using now, but would we need to do that through multi_index (Or any other run-time solution for that matter)? Wouldn't it be more efficient to build the hash-table statically through the code generator we are using now? You can't do lazy loading of the database and all that stuff that way, but you would loose the dependency of an external file for the database.
What you're saying sounds correct to me. http://groups.yahoo.com/group/boost/files/utf/ has utf-2003-01-12.zip. I have no idea what its status is but it seems to implement all kinds of UTF I/O you'll need. There's even a detect_from_bom.hpp which appears to check for a BOM and imbue the correct codecvt.
I'll take a look at it. Looks promising.