
On Fri, Jan 14, 2011 at 1:36 PM, Alexander Churanov <alexanderchuranov@gmail.com> wrote:
John,
As I understand the choice is between UTF-8 and UTF-16, since UTF-32 is a waste of memory. Given that, there is never fixed size for a character or linear times - both UTF-8 and UTF-16 are variable-size encodings of UTF-32.
Yes, my comment was in response to a comment about UTF-32 as pertaining to an internal encoding. I'd only use UTF-16 if the APIs I used required it, and the conversion could be done at the interface (for example in a fascade). What interests me is if there's a good reason to use UTF-8 internally and give UTF-32 the same treatment as UTF-16, or vice versa. I do find the simplicity of a fixed-width encoding alluring. By the way, I disagree with Peter's assessment that, "you rarely, if ever, need to access the Nth character," but I will gladly cede that this depends on your problem domain.