
Le 17/07/2010 17:26, Robert Ramey wrote:
and my personal favorite - dataflow iterators. But I suspect this functionality has probably been covered by Ranges - I'm don't know this, I'm just deducing that from the name.
I don't really understand what dataflow iterators are. Isn't it just a syntactic shortcut in the constructor of an iterator adaptor that calls the base constructor recursively?
Oh - codecvt_utf8. The whole codecvt thing is ripe for a library.
Since wchar_t is potentially 16-bit, utf8_codecvt_facet should do transcoding between UTF-8 and UTF-16, not between UTF-8 and UCS-2 as it does now. However it doesn't appear that it is possible to do N to M conversion well with a codecvt facet according to what someone said in another thread.
I realise that some proposals have been made in this area. I haven't studied them in detail so I don't want to be critical. But, my experience with using the codecvt facility in the serialization library leads me to suspect that it is better than is generally appreciated. In fact, the whole C++ streams is better than it first appears. The problems is it's sort of obtuse. Some libraries to help support it would help explain and promote this. I'm thinking of things like composable codecvt facets and alternative filebuf implementations. I've always felt the boost streams library got a little off track by not leverage enough on the standard library - a missed opportunity in my opinion.
What I've got as part as my Unicode library is a straight-forward Converter concept and convert iterators/ranges. You define a Converter that describes how to do one step of an arbitrary variable-width N to M conversion with input and output iterators, then you can turn it into an iterator adaptor to convert as you traverse or just apply it in a loop to do the conversion on the whole range eagerly. You can of course apply different Converters one after the other or even compose Converters, albeit the latter has limits since the steps need to play nicely together (i.e. either the Converter needs to be stable by concatenation, or the one applied first needs to have fixed-width output). I have made a facility to make a codecvt facet out of any Converter, but I suspect it doesn't really work at all since I don't think I deal with "partial" cases correctly, and I haven't come up with a practical way of dealing with Converters that are not stable by concatenation. The fact that you can't have anything other than char/char or wchar_t/char is also a bit limiting.