[locale] Support of non US-ASCII character set for messages keys

Hello, After reviewing all the discussion I've decided to do following changes in the interface to provide better support for non-US-ASCII keys. The actual thing that convinced me is a requirement to be able to include chars like © into the text... Currently there are following classes: template<typename CharType> class message_format : public std::locale::facet { public: ... typedef CharType char_type; virtual char_type const *get(int domain_id,char const *context,char const *id) const = 0; ... }; class message { public: ... explicit message(char const *id); ... // convert message to localized message template<typename CharType> std::basic_string<CharType> str(std::locale const &locale) const; }; ... inline message translate(char const *id); inline std::string gettext(char const *id,std::locale const &loc=std::locale()); inline std::wstring wgettext(char const *id,std::locale const &loc=std::locale()); ... Basically message is created using narrow id only and can be converted to multiple output formats narrow, wide and so on. std::cout << translate("Hello") << std::endl std::wcout << translate("Hello") << std::endl; And you could call: message msg = translate("Hello"); std::string hello = msg.str<char>(); std::wstring whello = msg.str<wchar_t>(); Work together. I'll change it in following way: template<typename CharType> class message_format : public std::locale::facet { public: ... typedef CharType char_type; virtual char_type const *get(int domain_id,char_type const *context,char_type const *id) const = 0; ... }; template<typename CharType> class basic_message { public: typedef CharType char_type; typedef std::basic_string<char_type> string_type; ... explicit message(char_type const *id); ... // convert message to localized message string_type str(std::locale const &locale) const; }; typedef basic_message<char> message; typedef basic_message<wchar_t> wmessage; typedef basic_message<char16_t> u16message; typedef basic_message<char32_t> u32message; ... inline message translate(char const *id); inline wmessage translate(wchar_t const *id); inline std::string gettext(char const *id,std::locale const &loc=std::locale()); inline std::wstring wgettext(wchar_t const *id,std::locale const &loc=std::locale()); ... Now you would have to: std::cout << translate("Hello") << std::endl std::wcout << translate(L"Hello") << std::endl; And you should call: message msg = translate("Hello"); wmessage wmsg = translate(L"Hello"); std::string hello = msg.str(); std::wstring whello = msg.str(); Additionally you would be able to specify the encoding of the source strings when adding domain. boost::locale::generator gen; gen.add_messages_domain("myprogram","windows-936"); While the default would always be UTF-8. So if you write in the program: std::cout << translate("平和") << std::cout Under GCC using UTF-8 sources you have anythig to do. If you are using MSVC then you'll have to provide a charset name as shown above or use u8"平和" Of course this would break the API for users who currently use Boost.Locale (and I know at least several project who will suffer). But this would probably bring it so some logical point and prevent rising these questions. If course you should remember that untranslated non-US-ASCII strings would be converted in the run-time to current locale's encoding. Regards, Artyom Beilis P.S.: Of course the documentation will still discourage programmers from using non-US-ASCII keys as they may not be displayed properly in local character sets and may confuse users.

On Thu, Apr 28, 2011 at 5:17 AM, Artyom <artyomtnk@yahoo.com> wrote:
Hello,
After reviewing all the discussion I've decided to do following changes in the interface to provide better support for non-US-ASCII keys.
The actual thing that convinced me is a requirement to be able to include chars like © into the text...
Currently there are following classes:
template<typename CharType> class message_format : public std::locale::facet { public: ... typedef CharType char_type; virtual char_type const *get(int domain_id,char const *context,char const *id) const = 0; ... };
class message { public: ... explicit message(char const *id); ... // convert message to localized message template<typename CharType> std::basic_string<CharType> str(std::locale const &locale) const;
};
... inline message translate(char const *id); inline std::string gettext(char const *id,std::locale const &loc=std::locale()); inline std::wstring wgettext(char const *id,std::locale const &loc=std::locale()); ...
Basically message is created using narrow id only and can be converted to multiple output formats narrow, wide and so on.
std::cout << translate("Hello") << std::endl std::wcout << translate("Hello") << std::endl;
And you could call:
message msg = translate("Hello"); std::string hello = msg.str<char>(); std::wstring whello = msg.str<wchar_t>();
Work together.
I'll change it in following way:
template<typename CharType> class message_format : public std::locale::facet { public: ... typedef CharType char_type; virtual char_type const *get(int domain_id,char_type const *context,char_type const *id) const = 0; ... };
template<typename CharType> class basic_message { public: typedef CharType char_type; typedef std::basic_string<char_type> string_type; ... explicit message(char_type const *id); ... // convert message to localized message string_type str(std::locale const &locale) const;
}; typedef basic_message<char> message; typedef basic_message<wchar_t> wmessage; typedef basic_message<char16_t> u16message; typedef basic_message<char32_t> u32message;
... inline message translate(char const *id); inline wmessage translate(wchar_t const *id); inline std::string gettext(char const *id,std::locale const &loc=std::locale()); inline std::wstring wgettext(wchar_t const *id,std::locale const &loc=std::locale()); ...
Now you would have to:
std::cout << translate("Hello") << std::endl std::wcout << translate(L"Hello") << std::endl;
And you should call:
message msg = translate("Hello"); wmessage wmsg = translate(L"Hello"); std::string hello = msg.str(); std::wstring whello = msg.str();
Additionally you would be able to specify the encoding of the source strings when adding domain.
boost::locale::generator gen; gen.add_messages_domain("myprogram","windows-936");
While the default would always be UTF-8.
So if you write in the program:
std::cout << translate("平和") << std::cout
Under GCC using UTF-8 sources you have anythig to do.
If you are using MSVC then you'll have to provide a charset name as shown above or use u8"平和"
Of course this would break the API for users who currently use Boost.Locale (and I know at least several project who will suffer).
But this would probably bring it so some logical point and prevent rising these questions.
If course you should remember that untranslated non-US-ASCII strings would be converted in the run-time to current locale's encoding.
Regards,
Artyom Beilis
P.S.: Of course the documentation will still discourage programmers from using non-US-ASCII keys as they may not be displayed properly in local character sets and may confuse users. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I appreciate you finally change your mind. -- Ryou Ezoe

On Thu, Apr 28, 2011 at 5:17 AM, Artyom <artyomtnk@yahoo.com> wrote:
Hello,
After reviewing all the discussion I've decided to do following changes in the interface to provide better support for non-US-ASCII keys.
The actual thing that convinced me is a requirement to be able to include chars like © into the text...
Currently there are following classes:
template<typename CharType> class message_format : public std::locale::facet { public: ... typedef CharType char_type; virtual char_type const *get(int domain_id,char const *context,char const *id) const = 0; ... };
class message { public: ... explicit message(char const *id); ... // convert message to localized message template<typename CharType> std::basic_string<CharType> str(std::locale const &locale) const;
};
... inline message translate(char const *id); inline std::string gettext(char const *id,std::locale const &loc=std::locale()); inline std::wstring wgettext(char const *id,std::locale const &loc=std::locale()); ...
Basically message is created using narrow id only and can be converted to multiple output formats narrow, wide and so on.
std::cout << translate("Hello") << std::endl std::wcout << translate("Hello") << std::endl;
And you could call:
message msg = translate("Hello"); std::string hello = msg.str<char>(); std::wstring whello = msg.str<wchar_t>();
Work together.
I'll change it in following way:
template<typename CharType> class message_format : public std::locale::facet { public: ... typedef CharType char_type; virtual char_type const *get(int domain_id,char_type const *context,char_type const *id) const = 0; ... };
template<typename CharType> class basic_message { public: typedef CharType char_type; typedef std::basic_string<char_type> string_type; ... explicit message(char_type const *id); ... // convert message to localized message string_type str(std::locale const &locale) const;
}; typedef basic_message<char> message; typedef basic_message<wchar_t> wmessage; typedef basic_message<char16_t> u16message; typedef basic_message<char32_t> u32message;
... inline message translate(char const *id); inline wmessage translate(wchar_t const *id); inline std::string gettext(char const *id,std::locale const &loc=std::locale()); inline std::wstring wgettext(wchar_t const *id,std::locale const &loc=std::locale()); ...
Now you would have to:
std::cout << translate("Hello") << std::endl std::wcout << translate(L"Hello") << std::endl;
And you should call:
message msg = translate("Hello"); wmessage wmsg = translate(L"Hello"); std::string hello = msg.str(); std::wstring whello = msg.str();
Additionally you would be able to specify the encoding of the source strings when adding domain.
boost::locale::generator gen; gen.add_messages_domain("myprogram","windows-936");
While the default would always be UTF-8.
So if you write in the program:
std::cout << translate("平和") << std::cout
Under GCC using UTF-8 sources you have anythig to do.
If you are using MSVC then you'll have to provide a charset name as shown above or use u8"平和"
Of course this would break the API for users who currently use Boost.Locale (and I know at least several project who will suffer).
But this would probably bring it so some logical point and prevent rising these questions.
If course you should remember that untranslated non-US-ASCII strings would be converted in the run-time to current locale's encoding.
Regards,
Artyom Beilis
P.S.: Of course the documentation will still discourage programmers from using non-US-ASCII keys as they may not be displayed properly in local character sets and may confuse users. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Just some thoughts about design. What is the format of string which specify encoding such as "windows-936"? Does it have to be string rather than, say, enum? Why there is no "translate" that takes string of char16_t or char32_t? Although I think it will take years other compilers support C++0x's new encoding prefixes. -- Ryou Ezoe

Just some thoughts about design.
What is the format of string which specify encoding such as "windows-936"? Does it have to be string rather than, say, enum?
I've mensioned it before and the response actually exists in the summary of the review. Also in this case the encoding is defined only once when the locale object is created so there is no performance issues. The second point is that a. Support of different encodings depends on specific backend and may be changed as new version of ICU or iconv is released, so no need to provide a enum b. Concept of numeric codepages is something windows specific. All other APIs around use string to represent the encoding.
Why there is no "translate" that takes string of char16_t or char32_t?
Of course, char16_t/char32_t there for compilers that support them. I didn't show all possible interfaces just an examples.
Although I think it will take years other compilers support C++0x's new encoding prefixes.
Not a single compiler I tested (MSVC10 Intel, GCC-4.6, SunCC) support C++0x characters properly so it will take a time. Artyom
participants (2)
-
Artyom
-
Ryou Ezoe