[lexical_cast] char types and UDTs

Say I have a wide-character, user-defined, streamable type like boost::wssub_match from the regex library (or now std::wssub_match). It's basically a pair of wstring iterators. It's stream insertion operator is defined like this: template <class charT, class ST, class BiIter> basic_ostream<charT, ST>& operator<<(basic_ostream<charT, ST>& os, const sub_match<BiIter>& m) { return (os << m.str()); } If it's not obvious, it requires that the stream's character type is the same as the BiIter's value_type. If I were to try to use boost::lexical_cast to cast this to something, it won't work. That's because lexical_cast doesn't know about wssub_match, and so doesn't know that its character type should be wchar_t. Instead, it defaults to char for any unknown types, and doesn't give a way to override this. There is an internal trait, boost::detail::stream_char, that lexical_cast uses to determine what the character type of the source is (if any). It's specialized on only a fixed number of types. Since it's an implementation detail, it can't be used to extend the set of types. Also, I tried specializing it. In past versions of Boost that just worked, but it doesn't anymore. I think being able to extend lexical_cast to support types like (std|boost)::wssub_match is an essential feature. Thoughts on how to get there from here? -- Eric Niebler BoostPro Computing http://www.boostpro.com

2012/4/11 Eric Niebler <eric@boostpro.com>:
There is an internal trait, boost::detail::stream_char, that lexical_cast uses to determine what the character type of the source is (if any). It's specialized on only a fixed number of types. Since it's an implementation detail, it can't be used to extend the set of types. Also, I tried specializing it. In past versions of Boost that just worked, but it doesn't anymore.
I think being able to extend lexical_cast to support types like (std|boost)::wssub_match is an essential feature. Thoughts on how to get there from here?
Specializing boost::detail::stream_char is not nice. Why just you don't specialize lexical_cast directly: namespace boost { template <class Target, class BiIter> Target lexical_cast(const sub_match<BiIter>& m) { return boost::lexical_cast<Target>(m.str()); } } // namespace boost [Not tested] Using this approach you will also get a faster version of lexical_cast (which does not copy data to STL stream and does not construct heavy STL stream objects). This approach is also described in lexical_cast trunk documentation, in section 'Tuning classes for fast lexical conversions'. I'll add your question to the FAQ section of lexical_cast documentation. -- Best regards, Antony Polukhin

On 4/10/2012 9:22 PM, Antony Polukhin wrote:
2012/4/11 Eric Niebler <eric@boostpro.com>:
There is an internal trait, boost::detail::stream_char, that lexical_cast uses to determine what the character type of the source is (if any). It's specialized on only a fixed number of types. Since it's an implementation detail, it can't be used to extend the set of types. Also, I tried specializing it. In past versions of Boost that just worked, but it doesn't anymore.
I think being able to extend lexical_cast to support types like (std|boost)::wssub_match is an essential feature. Thoughts on how to get there from here?
Specializing boost::detail::stream_char is not nice. Why just you don't specialize lexical_cast directly:
namespace boost {
template <class Target, class BiIter> Target lexical_cast(const sub_match<BiIter>& m) { return boost::lexical_cast<Target>(m.str()); }
} // namespace boost
Calling m.str() creates a temporary std::wstring object, which incurs a dynamic allocation and is slow. Speaking for my own library (xpressive), sub_match has an optimized stream insertion operator. It should be used. But my primary objection is below...
[Not tested]
Using this approach you will also get a faster version of lexical_cast (which does not copy data to STL stream and does not construct heavy STL stream objects). This approach is also described in lexical_cast trunk documentation, in section 'Tuning classes for fast lexical conversions'.
I'll add your question to the FAQ section of lexical_cast documentation.
Doesn't boost have a policy against adding overloads in the boost namespace? Perhaps not, but maybe it should. I know the std namespace has such a restriction. It seems dubious telling users to extend a library this way. A user should be able to do: Dst (*pfun)(Src const &) = &boost::lexical_cast<Dst, Src>; and expect that to "work". If there are a bevy of overloads, then that will end up calling a different function. -- Eric Niebler BoostPro Computing http://www.boostpro.com

On Wed, Apr 11, 2012 at 8:54 AM, Eric Niebler <eric@boostpro.com> wrote:
Calling m.str() creates a temporary std::wstring object, which incurs a dynamic allocation and is slow. Speaking for my own library (xpressive), sub_match has an optimized stream insertion operator. It should be used.
You have to construct a string somewhere, don't you? If you have one already, you could use iterator_range<const char*> instead to avoid a copy.
But my primary objection is below...
You're right, it'd be nice if that just worked. -- Olaf

On 4/11/2012 1:13 AM, Olaf van der Spek wrote:
On Wed, Apr 11, 2012 at 8:54 AM, Eric Niebler <eric@boostpro.com> wrote:
Calling m.str() creates a temporary std::wstring object, which incurs a dynamic allocation and is slow. Speaking for my own library (xpressive), sub_match has an optimized stream insertion operator. It should be used.
You have to construct a string somewhere, don't you? If you have one already, you could use iterator_range<const char*> instead to avoid a copy.
Yes, I see that lexical_cast has optimizations for iterator_range<wchar_t const *> and a few other, sufficiently 'string-like' types. But sub_match essentially *is* a string-like iterator_range. (It's a std::pair of iterators.) I'm genuinely surprised there's no way to tell lexical_cast that. Instead I have to just know (a) which are the magical types lexical_cast is optimized for and (b) for which it can determine the correct underlying stream character type (hint: the docs are unclear or out of date), and massage my type into one of those before calling lexical_cast. Why?
But my primary objection is below...
You're right, it'd be nice if that just worked.
-- Eric Niebler BoostPro Computing http://www.boostpro.com

On Wed, Apr 11, 2012 at 5:18 PM, Eric Niebler <eric@boostpro.com> wrote:
You have to construct a string somewhere, don't you? If you have one already, you could use iterator_range<const char*> instead to avoid a copy.
Yes, I see that lexical_cast has optimizations for iterator_range<wchar_t const *> and a few other, sufficiently 'string-like' types. But sub_match essentially *is* a string-like iterator_range. (It's a std::pair of iterators.) I'm genuinely surprised
Is it? Does it have begin() and end() for example? Why not use std::iterator_range instead of std::pair?
there's no way to tell lexical_cast that. Instead I have to just know (a) which are the magical types lexical_cast is optimized for and (b) for which it can determine the correct underlying stream character type (hint: the docs are unclear or out of date), and massage my type into one of those before calling lexical_cast. Why?
How could it automatically determine the necessary character type? Olaf

On 4/11/2012 9:36 AM, Olaf van der Spek wrote:
On Wed, Apr 11, 2012 at 5:18 PM, Eric Niebler <eric@boostpro.com> wrote:
You have to construct a string somewhere, don't you? If you have one already, you could use iterator_range<const char*> instead to avoid a copy.
Yes, I see that lexical_cast has optimizations for iterator_range<wchar_t const *> and a few other, sufficiently 'string-like' types. But sub_match essentially *is* a string-like iterator_range. (It's a std::pair of iterators.) I'm genuinely surprised
Is it? Does it have begin() and end() for example? Why not use std::iterator_range instead of std::pair?
Because the C++ standard says sub_match should inherit from std::pair. But does it matter? There are other 3rd party types that I'm sure users would like to adapt to lexical_cast. The docs just say a type needs a stream insertion operator. Shouldn't that be sufficient? Besides, using std::iterator_range would not solve the problem because lexical_cast only knows about boost::iterator_range. Do you see?
there's no way to tell lexical_cast that. Instead I have to just know (a) which are the magical types lexical_cast is optimized for and (b) for which it can determine the correct underlying stream character type (hint: the docs are unclear or out of date), and massage my type into one of those before calling lexical_cast. Why?
How could it automatically determine the necessary character type?
It can't be done automatically, but lexical_cast can expose the stream_char trait and make it a documented part of the interface. Users can specialize it for their types. -- Eric Niebler BoostPro Computing http://www.boostpro.com

2012/4/11 Eric Niebler <eric@boostpro.com>:
Calling m.str() creates a temporary std::wstring object, which incurs a dynamic allocation and is slow. Speaking for my own library (xpressive), sub_match has an optimized stream insertion operator. It should be used.
sub_match ostream operator is defined as: std::ostream_iterator<char_type, Char, Traits> iout(sout); // where sout is std::basic_ostream<Char, Traits> std::copy(sub.first, sub.second, iout); std::basic_ostream<Char, Traits> on uses std::basic_string, so copying ddata to it can incur a multiple dynamic allocations and multiple copings of data. 2012/4/11 Eric Niebler <eric@boostpro.com>:
Olaf van der Spek wrote:
How could it automatically determine the necessary character type?
It can't be done automatically, but lexical_cast can expose the stream_char trait and make it a documented part of the interface. Users can specialize it for their types.
I`ll try to restore boost::detail::stream_char functionality, but I won't try hard. That is not a nice solution. I`ll also remove notes about adding lexical_cast specializations., because it can break some code, that uses &boost::lexical_cast<Dst, Src>; But that is not an optimal solution. Optimal solution would look like (not tested, required just to get the idea): template <class Target, class Source> Target lexical_cast_ext(Source&& s) { Target t; if (!::boost::try_lexical_cast(t, std::forward<Source>(s))) // No ADL BOOST_LCAST_THROW_BAD_CAST(Source, Target); return t; // RVO must be applied by compiler } template <class Target, class Source> bool try_lexical_cast_ext(Target& t, Source&& s) noexcept { using namespace boost::detail; // For getting default `construct_lexical_cast_in_trait' and `construct_lexical_cast_out_trait' const auto& in = construct_lexical_cast_in_trait(std::forward<Source>(s)); // ADL if (in.fail()) return false; typedef boost::mpl::identity<Target> target_tag; // Must have up to 4 construct_lexical_cast_out_trait // {with parameters const char*, const wchar_t*, const char16_t*, const char32_t*} return construct_lexical_cast_out_trait(target_tag(), in.begin(), in.end()) // ADL .assign_value(t); // returns true if conversion is OK } User will need to add one `construct_lexical_cast_in_trait' and up to four `construct_lexical_cast_out_trait' functions to the namespace of user-defined class. This will allow us to : * get correct function pointers via &lexical_cast<Source, Target>; (lexical_cast overloading solution breaks that) * use stream operators << and >> in default lexical_cast traits * tune lexical_cast for fast conversions to string types(deduce_char_traits<> specializing solution does not solve that) * tune lexical_cast for fast conversions from string types(both solutions do not solve that) * convert from any type to any type through usage of any character type (both solutions do not solve that) * relax all the type requirements (current implementation does not allow that) * have a noexcept version (current implementation does not allow that) Any objections? Did I miss something except conversion tags for specifying base? But this looks more like a NEW lexical_cast library. It must be implemented and reviewed. I remember all the unsuccessful attempts to implement new conversion libraries, so until there is no huge interest in new lexical_cast library I will not give it a try. All other solutions have drawbacks and there will be always someone who needs more. Conclusion: If a higher degree of control is required over conversions std::stringstream and std::wstringstream must be used. Or find lots of people, interested in NEW lexical_cast library. -- Best regards, Antony Polukhin

On 11/04/12 19:57, Antony Polukhin wrote:
But that is not an optimal solution. Optimal solution would look like (not tested, required just to get the idea):
template<class Target, class Source>
Target lexical_cast_ext(Source&& s) {
Target t;
if (!::boost::try_lexical_cast(t, std::forward<Source>(s))) // No ADL BOOST_LCAST_THROW_BAD_CAST(Source, Target);
return t; // RVO must be applied by compiler
}
template<class Target, class Source>
bool try_lexical_cast_ext(Target& t, Source&& s) noexcept {
using namespace boost::detail; // For getting default `construct_lexical_cast_in_trait' and `construct_lexical_cast_out_trait'
const auto& in = construct_lexical_cast_in_trait(std::forward<Source>(s)); // ADL
if (in.fail()) return false;
typedef boost::mpl::identity<Target> target_tag;
// Must have up to 4 construct_lexical_cast_out_trait // {with parameters const char*, const wchar_t*, const char16_t*, const char32_t*}
return construct_lexical_cast_out_trait(target_tag(), in.begin(), in.end()) // ADL
.assign_value(t); // returns true if conversion is OK
}
User will need to add one `construct_lexical_cast_in_trait' and up to four `construct_lexical_cast_out_trait' functions to the namespace of user-defined class.
This will allow us to :
* get correct function pointers via&lexical_cast<Source, Target>; (lexical_cast overloading solution breaks that)
* use stream operators<< and>> in default lexical_cast traits
* tune lexical_cast for fast conversions to string types(deduce_char_traits<> specializing solution does not solve that)
* tune lexical_cast for fast conversions from string types(both solutions do not solve that)
* convert from any type to any type through usage of any character type (both solutions do not solve that)
* relax all the type requirements (current implementation does not allow that)
* have a noexcept version (current implementation does not allow that)
Any objections? Did I miss something except conversion tags for specifying base?
Why is lexical_cast so complicated and full and quirks? All that stuff looks messy and over-engineered.

12.04.2012 15:20 "Mathias Gaunard" <mathias.gaunard@ens-lyon.org>:
Why is lexical_cast so complicated and full and quirks?
All that stuff looks messy and over-engineered.
That was an example of lexical_cast with all the wishes implemented. Current implementation is simple, and user only required to provide stream operators << and >> to work with it. -- Best regards, Antony Polukhin

On 12/04/12 14:24, Antony Polukhin wrote:
12.04.2012 15:20 "Mathias Gaunard"<mathias.gaunard@ens-lyon.org>:
Why is lexical_cast so complicated and full and quirks?
All that stuff looks messy and over-engineered.
That was an example of lexical_cast with all the wishes implemented.
Current implementation is simple, and user only required to provide stream operators<< and>> to work with it.
Yes, but apparently, that doesn't work with user-defined wide string types. Having to implement three traits that each must provide conversion facilities to all of char*, wchar_t*, char16_t* and char32_t* seems a bit much. Surely there must be a simpler solution.

2012/4/12 Mathias Gaunard <mathias.gaunard@ens-lyon.org>:
On 12/04/12 14:24, Antony Polukhin wrote:
Current implementation is simple, and user only required to provide stream operators<< and>> to work with it.
Yes, but apparently, that doesn't work with user-defined wide string types.
Having to implement three traits that each must provide conversion facilities to all of char*, wchar_t*, char16_t* and char32_t* seems a bit much.
Surely there must be a simpler solution.
First (and simpliest) solution is: leave everything as is. If lexical_cast fails - use wstream. Second solution: allow user to specialize stream_char<>. Other solutions will add much more complications, introduce ugly conversion API. Specializing stream_char<> does not solve all the problems. Noexcept version of lexical cast, fast user defined conversions are still impossible. There is one more solution: determinate stream_char<>::type for UserType using lots of meta-programming. For that solution following meta-functions required: has_output_stream_operator_for_char<UserType>::value has_output_stream_operator_for_wchar_t<UserType>::value has_output_stream_operator_for_char16_t<UserType>::value has_output_stream_operator_for_char32_t<UserType>::value Is there any ideas, how that can be done in a *portable* way? -- Best regards, Antony Polukhin

There is one more solution: determinate stream_char<>::type for UserType using lots of meta-programming. For that solution following meta-functions required: has_output_stream_operator_for_char<UserType>::value has_output_stream_operator_for_wchar_t<UserType>::value has_output_stream_operator_for_char16_t<UserType>::value has_output_stream_operator_for_char32_t<UserType>::value
Is there any ideas, how that can be done in a *portable* way?
Perhaps Boost.TTI can be of help? Regards, Nate

2012/4/12 Nathan Ridge <zeratul976@hotmail.com>:
There is one more solution: determinate stream_char<>::type for UserType using lots of meta-programming. For that solution following meta-functions required: has_output_stream_operator_for_char<UserType>::value has_output_stream_operator_for_wchar_t<UserType>::value has_output_stream_operator_for_char16_t<UserType>::value has_output_stream_operator_for_char32_t<UserType>::value
Is there any ideas, how that can be done in a *portable* way?
Perhaps Boost.TTI can be of help?
Boost.TypeTraits looks like a good solution. Using has_left_shift<> and has_right_shift<> it is possible to create required meta-functions. Created ticket #6786 -- Best regards, Antony Polukhin

On Wed, Apr 11, 2012 at 7:12 PM, Eric Niebler <eric@boostpro.com> wrote:
On 4/11/2012 9:36 AM, Olaf van der Spek wrote:
On Wed, Apr 11, 2012 at 5:18 PM, Eric Niebler <eric@boostpro.com> wrote:
You have to construct a string somewhere, don't you? If you have one already, you could use iterator_range<const char*> instead to avoid a copy.
Yes, I see that lexical_cast has optimizations for iterator_range<wchar_t const *> and a few other, sufficiently 'string-like' types. But sub_match essentially *is* a string-like iterator_range. (It's a std::pair of iterators.) I'm genuinely surprised
Is it? Does it have begin() and end() for example? Why not use std::iterator_range instead of std::pair?
Because the C++ standard says sub_match should inherit from std::pair. But does it matter? There are other 3rd party types that I'm sure users would like to adapt to lexical_cast. The docs just say a type needs a stream insertion operator. Shouldn't that be sufficient?
Not if you want to avoid copying the data.
Besides, using std::iterator_range would not solve the problem because lexical_cast only knows about boost::iterator_range. Do you see?
Actually, I don't think std::iterator_range exists. I probably meant boost::iterator_range.
there's no way to tell lexical_cast that. Instead I have to just know (a) which are the magical types lexical_cast is optimized for and (b) for which it can determine the correct underlying stream character type (hint: the docs are unclear or out of date), and massage my type into one of those before calling lexical_cast. Why?
How could it automatically determine the necessary character type?
It can't be done automatically, but lexical_cast can expose the stream_char trait and make it a documented part of the interface. Users can specialize it for their types.
Maybe, but lexical_cast isn't the only code that wants to consume string-like types. IMO this should be handled in a more general way. -- Olaf
participants (5)
-
Antony Polukhin
-
Eric Niebler
-
Mathias Gaunard
-
Nathan Ridge
-
Olaf van der Spek