
On Fri, Nov 30, 2012 at 12:29 PM, Ryo IGARASHI <rigarash@gmail.com> wrote:
Hi Artyom,
On Thu, Nov 29, 2012 at 10:45 PM, Artyom Beilis <artyomtnk@yahoo.com> wrote:
If so there is no such a locale under windows that works with Shift_JIS...
[...] See the reference information from Microsoft: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q170559 (Note that 'Shift JIS' in the above link means CP932)
This means that in order to handle the Japanese string properly under Windows, the programmers are encouraged not to convert at all. [...]
As I understand from the page the problem of CP932 is that it has duplicate code points, so a CP932 → UTF-8 → CP932 will result in, although binary different, but semantically identical text. I do not see a problem with this. So Unicode itself has *many more* ways to encode the same thing, including, but not limited to, duplicate code points and combining characters. And we are living with this fine for years. The solution to this is using normalization if this *really* matters. And where it matters (comparison, likely. What else?) you will be forced to normalize your CP932 too... -- Yakov