
Mathias Gaunard wrote:
Thankfully, the Unicode standard defines representation and a lot of operations. The funny thing is that some languages, such as Thai, actually require a dictionary to tell words apart from each other, since there are no explicit word boundaries (alternatively, it can be done using machine learning algorithms to percept word-like constructs, there are quite a few research papers on that topic).
I would further that by not only allowing spoken languages but generalize the concept to take any form of bit groupings, then it could be useful in other areas of comp-sci. Furthermore, Bloom filters would only be one aspect of such a library, the underlying data structures would require tries, wide-column stores etc. Seems more like a few GSOCs. Arash Partow ________________________________________________________ Be one who knows what they don't know, Instead of being one who knows not what they don't know, Thinking they know everything about all things. http://www.partow.net