
On 05/24/2010 05:06 PM, Artyom wrote:
- There is absolutely no information given about std::mbstate_t that should save intermediate data between conversions so, there is actually no way to pass anything between sequential calls of std::locale::codecvt<...>::in/out. So even if I observe first surrogate pair there is no way to pass this information for next call and thus I loose this information
Well, that's not exactly true. mbstate_t is defined by the C standard, and indeed, it says pretty much nothing about its nature, except that it's not an array. But on any platform I worked with (including Windows) it's an integer. I think, it is perfectly fair to assume that it is at least a POD and sizeof(mbstate_t) >= 1, which makes it possible to store information about surrogate pairs in it. The C++ standard does give some hints regarding how the conversion state shall be handled by the stream. In particular, it specifies that the state will be value-initialized at the beginning of the conversion, and it will call `shift` at the end of the conversion in order to finalize the converted character sequence and return the state to its initial value. Not that it makes it easier to use mbstate_t with UCI under the hood, but it seems possible (theoretically, at least) to implement the complete UTF-16 <-> char conversion with it. PS: I don't pretend that I'd learned the standards by heart. All the references are off the top of my head. :)