
Phil Endecott wrote:
Felipe Magno de Almeida wrote:
On Fri, Feb 15, 2008 at 3:54 PM, Phil Endecott wrote:
This week I have been writing some UTF-8 encoding and decoding and Unicode<->iso8859 conversion algorithms. They seem to be faster than the libc implementations which is satisfying especially as I haven't even started on the serious optimisations yet. This will be part of the strings-tagged-with-character-sets stuff that I have described before. Anyone interested?
Sure. Though I'm most interested in all charset conversions. But the most usual is enough to speed up my application *a lot*.
Thanks to everyone who expressed an interest.
I will attempt to have some sort of documentation and code available in the next few days. Pester me if I don't produce anything.
OK, the code is here: http://svn.chezphil.org/libpbe/trunk/include/charset/ and there are some very basic docs here: http://svn.chezphil.org/libpbe/trunk/doc/charsets/ (Have a look at intro.txt for the feature list.) This code is not yet Boostified (namespaces, directory layout etc.) Most of it compiles but it has hardly been exercised at all. The functionality includes conversion between UTF-8, UCS-2, UCS-4, ASCII and ISO-8859-*. Things I'd appreciate feedback on: - What should the cs_string look like? Basically everywhere that std::string uses an integer position I have the choice of a character position, a unit position, or an iterator - or not providing that function. - What character sets are people interested in using (a) at the "edges" of their programs, and (b) in the "core"? Regards, Phil.