8 Jan
2020
8 Jan
'20
1:16 a.m.
Gavin Lambert wrote:
But the conversion from WTF-8 to UCS-16 can interpret the joining point as a different character, resulting in a different sequence. Unless I've misread something, this could occur if the first string ended in an unpaired high surrogate and the second started with an unpaired low surrogate (or rather the WTF-8 equivalents thereof).
I don't see why do you think this would present a problem. The conversion of the first string will end in an unpaired high surrogate. The conversion of the second string will start with an unpaired low surrogate. The two, when concatenated, will form a valid UTF-16 encoding of a non-BMP character. Where is the issue here?