
Hi Artyom, On Mon, May 28, 2012 at 2:33 PM, Artyom Beilis <artyomtnk@yahoo.com> wrote:
I comments on a library that I want to submit for a formal review.
The library provides an implementation of standard C and C++ library functions such that their inputs are UTF-8 aware on Windows without requiring using Wide API to make program work on Windows.
here are my 0.02 Euro: I completely agree that for general-purpose text storage and handling (reading lines from text-file/console, reading user input from GUI, displaying formatted (and localized) messages to the user in a UI, etc., etc.) UTF-8 should *finally* be adopted. The other encodings (including UCS-2, UTF-16/32) have their uses, but should be treated as special cases. The nowide library is certainly useful within the (limited) scope of working with text obtained from the OS and passed to the OS where you can make some assumptions and guess the encoding that the OS uses and do the conversions from and to UTF8, BUT ... many text-handling applications tend also use third-party libraries which also have their own ideas about text encodings and your library would be *much* more useful if it allowed to "talk" to such libraries (or devices). So let me reiterate some points I already mentioned in the earlier text-related discussions here: 1) Let's use std::string as a encoding-agnostic string as it has always been - the encoding of the data stored in string should be application dependent. 2) Let's implement a text storage class (and let's call it) text; This class would store text (internally in whatever encoding is the "best" at the specific platform) and would have the following function defined: /* UTF-8 encoded */ std::sting str(text t); - This function would return a std::string containing the text stored in t encoded in UTF-8. template <typename SymbolicEncodingTag> text text::from(std::basic_string<SymbolicEncodingTag::CharT> s) - This function would convert the string stored in s to text assuming that s is encoded in encoding specified by SymbolicEncodingTag. template <typename SymbolicEncodingTag> std::basic_string<SymbolicEncodingTag::CharT> text::to(text t); - This function would convert the text stored in t to a std::string encoded in encoding specified by SymbolicEncodingTag. The encoding tags would specify both concrete encodings like UTF-16 or ISO-8859-2, etc. and symbolic encodings like OS (which would autodetect the OS's encoding) or libFoo which would use libFoo's encoding. Actually the library would not have to specify many tags for concrete third-party libraries (maybe only the most popular). Instead it would provide some means to define the tags to applications based on their needs. The text class would be used to store text in class members, functions parameters, variables, etc. and would be converted to string (in whatever encoding) only when the contents of the text has to be examined byte-by-byte, CP-by-CP, etc. or passed to the OS, library or device requiring a specific encoding. Also initialization of text from c-string-literals should be handled correctly on various platforms/compilers. If I'm not terribly mistaken all the code for conversions between encodings already is part of Boost.Locale. Then all the useful things like the nowide::args class and the wrappers around iostreams, etc. could be implemented on top of that. Best, Matus