
From: Patrick Horgan <phorgan1@gmail.com> On 01/14/2011 02:05 PM, Peter Dimov wrote:
John B. Turpish wrote:
By the way, I disagree with Peter's assessment that, "you rarely, if ever, need to access the Nth character," but I will gladly cede that this depends on your problem domain.
It obviously depends on the problem domain :-) but, when talking about Unicode, you can't reliably access the Nth character, in general, even with UCS-32. (As far as I know.)
I don't understand. UCS-32 (I assume you meant encoded as UTF-32) is a fixed width encoding so the n-th character is just 4n away from the beginning of the string. Right?
No, Nth Unicode code-point is at nth position not a character. For example in word "שָלוֹם" as 4 characters "שָ", "ל", "וֹ", "ם" and 6 code points: ש ָ ל ו ֹ מ Where two code points are diacritic marks. Boost.Locale has special character iterator to handle characters for this purpose and it works on characters and not code points. See: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#8e296a067a3756... Artyom