Re: [boost] UTF-8 conversion etc.

25 Feb 2008


      On Mon, Feb 25, 2008 at 8:09 AM, Sebastian Redl
<sebastian.redl@getdesigned.at> wrote:
...
Phil Endecott wrote:
...
Things I'd appreciate feedback on:
- What should the cs_string look like?  Basically everywhere that
std::string uses an integer position I have the choice of a character
position, a unit position, or an iterator - or not providing that function.
I think emulating std::string doesn't work. It has a naive design based
 on the assumption of fixed-width encodings. I think that a tagged string
 is the best place to really start over with a string design and produce
 a string that is lean, rather than bloated.
I agree.
...
I think the string type should offer minimal manipulation facilities -
 either completely read-only or append as the only manipulation function.
I would like to have at least a modifiable string. But only through
iterators (insert and erase).
That should suffice all my algorithm needs.
...
A string buffer type could be written as a mutable alternative, as is
 the design in Java and C#. However, I'm not sure how much of that
 interface is needed, either.
A modifiable iterator interface (with insert and erase) is, IMO, as
concise and extensible as possible.
...
I'd love to have some empirical data on string usage.
I do some string manipulations on email. And it is usually better to
do all manipulations in the codepage received, instead of converting
back and forth.
...
...
- What character sets are people interested in using (a) at the "edges"
of their programs,
 As many as possible. Theoretically, a program might have to deal with
 any and all encodings out there. Realistically, there's probably a dozen
 or two that are relevant. You'd need empirical data.
Unfortunately I need all supported by MIME.
...
...
and (b) in the "core"?
ASCII, UTF-8 and UTF-16.
ISO-8859-1 ?
...
Sebastian
-- 
Felipe Magno de Almeida