Re: [boost] Formal Review Request: Boost.String.Convert

Upgraded to the version 0.10. Added boost::string::is_string<T> SFINAE-based check. That allowed to discriminate the applicability of conversion functions. Now they demand one of the parameters to be a string (in broad sense, like C strings, std::(w)string, chat/wchar_t-based containers) and the other *not* a string. Have straighten up the main boost::string::convert() interface (was quite broken) so that all the tests now run as expected. g++ 4.2.4 (Linux) and Visual Studio 2008 are happy. Thanks, Vladimir.

Upgraded to the version 0.10. Added boost::string::is_string<T> SFINAE-based check. That allowed to discriminate the applicability of conversion functions. Now they demand one of the parameters to be a string (in broad sense, like C strings, std::(w)string, chat/wchar_t-based containers) and the other *not* a string. Have straighten up the main boost::string::convert() interface (was quite broken) so that all the tests now run as expected.
Why not allow to convert between different string types? Possible applications: - std::string <--> std::wstring or similar (based on a future Boost.Unicode library) - conversion between different symbol (character) sets Regards Hartmut

Hartmut,
Why not allow to convert between different string types?
Yes, I believe that is a very sensible question. In fact, I've tightened up the type checks for the implemented boost::string::convert() to minimize possible signature clashes with/of further extensions (via specializations/oveloads). In the current form boost::string::convert() is essentially a replacement for lexical_cast with added forrmatting, locale, etc. support. To add, say, u8string<->wstring conversion support we'll need overloads std::wstring convert(std::u8string) std::u8string convert(std::wstring) added.
Possible applications:
- std::string <--> std::wstring or similar (based on a future Boost.Unicode library)
I am not sure we can do std::string <-> std::wstring unless we know what std::string represents (currently it can be UTF8 or MBCS). If, with the introduction of std::u8string, std::string is guaranteed to be MBCS, then I guess we can have std::string <-> std::wstring as well.
- conversion between different symbol (character) sets
Currently convert() heavily relies on supplied types. Are we going to have distinct types for different symbol (character) sets? If not, then we might move forward as I've done for the throwing behavior (i.e. run-time configuration vs. compile-time configuration): int i = boost::string::convert(str, -1) >> boost::throw_t(); to pass a clue/directive what to do. Similarly we might do string new_set_str = boost::string::convert(old_set_str) >> new_set_directive(); Just thinking out loud. Does it look anywhere close to what you had in mind? V. P.S. Thank you for your Spirit conversion snippet another day. Appreciated.

Why not allow to convert between different string types?
Yes, I believe that is a very sensible question. In fact, I've tightened up the type checks for the implemented boost::string::convert() to minimize possible signature clashes with/of further extensions (via specializations/oveloads).
In the current form boost::string::convert() is essentially a replacement for lexical_cast with added forrmatting, locale, etc. support. To add, say, u8string<->wstring conversion support we'll need overloads
std::wstring convert(std::u8string) std::u8string convert(std::wstring)
added.
Makes sense.
Possible applications:
- std::string <--> std::wstring or similar (based on a future Boost.Unicode library)
I am not sure we can do std::string <-> std::wstring unless we know what std::string represents (currently it can be UTF8 or MBCS). If, with the introduction of std::u8string, std::string is guaranteed to be MBCS, then I guess we can have std::string <-> std::wstring as well.
That's what I meant. Sorry for being in-concise.
- conversion between different symbol (character) sets
Currently convert() heavily relies on supplied types. Are we going to have distinct types for different symbol (character) sets? If not, then we might move forward as I've done for the throwing behavior (i.e. run-time configuration vs. compile-time configuration):
int i = boost::string::convert(str, -1) >> boost::throw_t();
to pass a clue/directive what to do. Similarly we might do
string new_set_str = boost::string::convert(old_set_str) >> new_set_directive();
Just thinking out loud. Does it look anywhere close to what you had in mind?
In Spirit we use a using namespace boost::spirit::ascii; (or similar) to tie in a specific character set. I'm not sure if this is a viable solution for you. Regards Hartmut

Vladimir Batov wrote:
- std::string <--> std::wstring or similar (based on a future Boost.Unicode library)
I am not sure we can do std::string <-> std::wstring unless we know what std::string represents (currently it can be UTF8 or MBCS). If, with the introduction of std::u8string, std::string is guaranteed to be MBCS, then I guess we can have std::string <-> std::wstring as well.
I think it would be sufficient to rely on the locale to make decisions about the char nature. I have solved this particular task in Boost.Log. You may find it in boost/log/detail/code_conversion.hpp and libs/log/src/code_conversion.cpp, if you're interested.

I think it would be sufficient to rely on the locale to make decisions about the char nature. I have solved this particular task in Boost.Log. You may find it in boost/log/detail/code_conversion.hpp and libs/log/src/code_conversion.cpp, if you're interested.
Andrey, thank you for the pointer. I'll definitely have a look. I am somewhat cautious though due to past experience. Back then we had OpenLDAP on Windows. That is, for all internal purposes we used the platform's coding -- MBCS. However, for anything OpenLDAP-related we had to handle UTF8 as heavily. That is we had to have explicit UTF8<->MBCS<->WIDE and could not rely on any support from locale. V.

Vladimir Batov wrote:
I think it would be sufficient to rely on the locale to make decisions about the char nature. I have solved this particular task in Boost.Log. You may find it in boost/log/detail/code_conversion.hpp and libs/log/src/code_conversion.cpp, if you're interested.
Andrey, thank you for the pointer. I'll definitely have a look. I am somewhat cautious though due to past experience. Back then we had OpenLDAP on Windows. That is, for all internal purposes we used the platform's coding -- MBCS. However, for anything OpenLDAP-related we had to handle UTF8 as heavily. That is we had to have explicit UTF8<->MBCS<->WIDE and could not rely on any support from locale.
The locale can be adjusted, if needed. One only has to substitute the codecvt facet in the locale to use different encoding rules. I think that your particular UTF8<->MBCS<->WIDE case could have been solved this way, too.
participants (4)
-
Andrey Semashev
-
Hartmut Kaiser
-
Vladimir Batov
-
Vladimir.Batov@wrsa.com.au