Re: [boost] [General] Always treat std::strings as UTF-8

15 Jan 2011


      On Fri, Jan 14, 2011 at 9:35 PM, Patrick Horgan <phorgan1@gmail.com> wrote:
...
On 01/14/2011 02:05 PM, Peter Dimov wrote:
...
John B. Turpish wrote:
...
By the way, I disagree with Peter's assessment that, "you rarely, if
ever, need to access the Nth character," but I will gladly cede that this
depends on your problem domain.
It obviously depends on the problem domain :-) but, when talking about
Unicode, you can't reliably access the Nth character, in general, even with
UCS-32. (As far as I know.)
I don't understand.  UCS-32 (I assume you meant encoded as UTF-32) is a
fixed width encoding so the n-th character is just 4n away from the
beginning of the string.  Right?
No.  The nth code point is 4n bytes from the beginning of the string,
but characters may be made of a combination of adjacent code points.


-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com