
On 13 August 2011 20:10, Daniel James <dnljms@gmail.com> wrote:
On 13 August 2011 19:02, Dave Abrahams <dave@boostpro.com> wrote:
I think I agree with Artyom here. *Somebody* has to decide how that datatype will be interpreted when we receive it. Unless we refuse altogether to accept std::string in our interfaces (which sounds like a bad idea to me), why not make the decision that it's UTF-8?
Because if the native encoding isn't UTF-8 that will give the wrong result for cases such as:
int main(int argc, char** argv) { // .... boost::filesystem::path p(argv[0]);
As a reader of the long discussions of a new string class, it seems to me the only solution left is to pass the encoding as a separate entity from the string to those functions that'll need it. Because: * A new string class only pushes the problem one way further up from the library level, and imposes unnecessary copying of data on those who don't need/want it. There's a myriad of string classes already, yet another adapter/container doesn't make things cleaner. * Enforcing UTF-8 possible breaks existing applications, which assume the current behaviour (whatever that is). With the above options discarded, I see this (in some form): enum string_encoding { platform_specific, utf_8, }; { boost::filesystem::path(const char* str, boost::string_encoding e = boost::platform_specific); } If boost were to settle for only two viable encodings, i.e. platform_specific (or whatever name that matches the current behaviour in related libraries) and utf_8, it would at least imply that utf_8 is the preferred viable option for portable code, even if libraries default to platform_specific for backward compatibility. the utf_8 encoding would take the route that Artyom advocates, but in a more explicit way. Well, that's my two euro cents ;) cheers, - Christian