
On Fri, Jan 14, 2011 at 9:35 PM, Patrick Horgan <phorgan1@gmail.com> wrote:
On 01/14/2011 02:05 PM, Peter Dimov wrote:
John B. Turpish wrote:
By the way, I disagree with Peter's assessment that, "you rarely, if ever, need to access the Nth character," but I will gladly cede that this depends on your problem domain.
It obviously depends on the problem domain :-) but, when talking about Unicode, you can't reliably access the Nth character, in general, even with UCS-32. (As far as I know.)
I don't understand. UCS-32 (I assume you meant encoded as UTF-32) is a fixed width encoding so the n-th character is just 4n away from the beginning of the string. Right?
No. The nth code point is 4n bytes from the beginning of the string, but characters may be made of a combination of adjacent code points. -- Dave Abrahams BoostPro Computing http://www.boostpro.com