Re: [boost] UTF-8 conversion etc.

19 Feb 2008

      Phil Endecott wrote:
...
Felipe Magno de Almeida wrote:
...
On Fri, Feb 15, 2008 at 3:54 PM, Phil Endecott wrote:
...
This week I
 have been writing some UTF-8 encoding and decoding and
 Unicode<->iso8859 conversion algorithms.  They seem to be faster than
 the libc implementations which is satisfying especially as I haven't
 even started on the serious optimisations yet.  This will be part of
 the strings-tagged-with-character-sets stuff that I have described
 before.  Anyone interested?
Sure. Though I'm most interested in all charset conversions. But the
most usual is enough to speed up my application *a lot*.
Thanks to everyone who expressed an interest.
I will attempt to have some sort of documentation and code available in 
the next few days.  Pester me if I don't produce anything.
OK, the code is here:
   http://svn.chezphil.org/libpbe/trunk/include/charset/

and there are some very basic docs here:
   http://svn.chezphil.org/libpbe/trunk/doc/charsets/
(Have a look at intro.txt for the feature list.)

This code is not yet Boostified (namespaces, directory layout etc.)
Most of it compiles but it has hardly been exercised at all.
The functionality includes conversion between UTF-8, UCS-2, UCS-4, 
ASCII and ISO-8859-*.

Things I'd appreciate feedback on:
- What should the cs_string look like?  Basically everywhere that 
std::string uses an integer position I have the choice of a character 
position, a unit position, or an iterator - or not providing that function.
- What character sets are people interested in using (a) at the "edges" 
of their programs, and (b) in the "core"?

Regards,  Phil.