[lexical_cast] char types and UDTs

Eric Niebler

11 Apr 2012 11 Apr '12

2:50 a.m.

Say I have a wide-character, user-defined, streamable type like boost::wssub_match from the regex library (or now std::wssub_match). It's basically a pair of wstring iterators. It's stream insertion operator is defined like this: template <class charT, class ST, class BiIter> basic_ostream<charT, ST>& operator<<(basic_ostream<charT, ST>& os, const sub_match<BiIter>& m) { return (os << m.str()); } If it's not obvious, it requires that the stream's character type is the same as the BiIter's value_type. If I were to try to use boost::lexical_cast to cast this to something, it won't work. That's because lexical_cast doesn't know about wssub_match, and so doesn't know that its character type should be wchar_t. Instead, it defaults to char for any unknown types, and doesn't give a way to override this. There is an internal trait, boost::detail::stream_char, that lexical_cast uses to determine what the character type of the source is (if any). It's specialized on only a fixed number of types. Since it's an implementation detail, it can't be used to extend the set of types. Also, I tried specializing it. In past versions of Boost that just worked, but it doesn't anymore. I think being able to extend lexical_cast to support types like (std|boost)::wssub_match is an essential feature. Thoughts on how to get there from here? -- Eric Niebler BoostPro Computing http://www.boostpro.com

Show replies by date

Antony Polukhin

11 Apr 11 Apr

4:22 a.m.

2012/4/11 Eric Niebler <eric@boostpro.com>:

...

There is an internal trait, boost::detail::stream_char, that lexical_cast uses to determine what the character type of the source is (if any). It's specialized on only a fixed number of types. Since it's an implementation detail, it can't be used to extend the set of types. Also, I tried specializing it. In past versions of Boost that just worked, but it doesn't anymore.

I think being able to extend lexical_cast to support types like (std|boost)::wssub_match is an essential feature. Thoughts on how to get there from here?

Specializing boost::detail::stream_char is not nice. Why just you don't specialize lexical_cast directly: namespace boost { template <class Target, class BiIter> Target lexical_cast(const sub_match<BiIter>& m) { return boost::lexical_cast<Target>(m.str()); } } // namespace boost [Not tested] Using this approach you will also get a faster version of lexical_cast (which does not copy data to STL stream and does not construct heavy STL stream objects). This approach is also described in lexical_cast trunk documentation, in section 'Tuning classes for fast lexical conversions'. I'll add your question to the FAQ section of lexical_cast documentation. -- Best regards, Antony Polukhin

Eric Niebler

6:54 a.m.

On 4/10/2012 9:22 PM, Antony Polukhin wrote:

...

2012/4/11 Eric Niebler <eric@boostpro.com>:

...
There is an internal trait, boost::detail::stream_char, that lexical_cast uses to determine what the character type of the source is (if any). It's specialized on only a fixed number of types. Since it's an implementation detail, it can't be used to extend the set of types. Also, I tried specializing it. In past versions of Boost that just worked, but it doesn't anymore.

I think being able to extend lexical_cast to support types like (std|boost)::wssub_match is an essential feature. Thoughts on how to get there from here?

Specializing boost::detail::stream_char is not nice. Why just you don't specialize lexical_cast directly:

namespace boost {

template <class Target, class BiIter> Target lexical_cast(const sub_match<BiIter>& m) { return boost::lexical_cast<Target>(m.str()); }

} // namespace boost

Calling m.str() creates a temporary std::wstring object, which incurs a dynamic allocation and is slow. Speaking for my own library (xpressive), sub_match has an optimized stream insertion operator. It should be used. But my primary objection is below...

...

[Not tested]

Using this approach you will also get a faster version of lexical_cast (which does not copy data to STL stream and does not construct heavy STL stream objects). This approach is also described in lexical_cast trunk documentation, in section 'Tuning classes for fast lexical conversions'.

I'll add your question to the FAQ section of lexical_cast documentation.

Doesn't boost have a policy against adding overloads in the boost namespace? Perhaps not, but maybe it should. I know the std namespace has such a restriction. It seems dubious telling users to extend a library this way. A user should be able to do: Dst (*pfun)(Src const &) = &boost::lexical_cast<Dst, Src>; and expect that to "work". If there are a bevy of overloads, then that will end up calling a different function. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Olaf van der Spek

8:13 a.m.

On Wed, Apr 11, 2012 at 8:54 AM, Eric Niebler <eric@boostpro.com> wrote:

...

Calling m.str() creates a temporary std::wstring object, which incurs a dynamic allocation and is slow. Speaking for my own library (xpressive), sub_match has an optimized stream insertion operator. It should be used.

You have to construct a string somewhere, don't you? If you have one already, you could use iterator_range<const char*> instead to avoid a copy.

...

But my primary objection is below...

You're right, it'd be nice if that just worked. -- Olaf

Eric Niebler

3:18 p.m.

On 4/11/2012 1:13 AM, Olaf van der Spek wrote:

...

On Wed, Apr 11, 2012 at 8:54 AM, Eric Niebler <eric@boostpro.com> wrote:

...
Calling m.str() creates a temporary std::wstring object, which incurs a dynamic allocation and is slow. Speaking for my own library (xpressive), sub_match has an optimized stream insertion operator. It should be used.

You have to construct a string somewhere, don't you? If you have one already, you could use iterator_range<const char*> instead to avoid a copy.

Yes, I see that lexical_cast has optimizations for iterator_range<wchar_t const *> and a few other, sufficiently 'string-like' types. But sub_match essentially *is* a string-like iterator_range. (It's a std::pair of iterators.) I'm genuinely surprised there's no way to tell lexical_cast that. Instead I have to just know (a) which are the magical types lexical_cast is optimized for and (b) for which it can determine the correct underlying stream character type (hint: the docs are unclear or out of date), and massage my type into one of those before calling lexical_cast. Why?

...

...
But my primary objection is below...

You're right, it'd be nice if that just worked.

-- Eric Niebler BoostPro Computing http://www.boostpro.com

Olaf van der Spek

4:36 p.m.

On Wed, Apr 11, 2012 at 5:18 PM, Eric Niebler <eric@boostpro.com> wrote:

...

...
You have to construct a string somewhere, don't you? If you have one already, you could use iterator_range<const char*> instead to avoid a copy.

Yes, I see that lexical_cast has optimizations for iterator_range<wchar_t const *> and a few other, sufficiently 'string-like' types. But sub_match essentially *is* a string-like iterator_range. (It's a std::pair of iterators.) I'm genuinely surprised

Is it? Does it have begin() and end() for example? Why not use std::iterator_range instead of std::pair?

...

there's no way to tell lexical_cast that. Instead I have to just know (a) which are the magical types lexical_cast is optimized for and (b) for which it can determine the correct underlying stream character type (hint: the docs are unclear or out of date), and massage my type into one of those before calling lexical_cast. Why?

How could it automatically determine the necessary character type? Olaf

Eric Niebler

5:12 p.m.

On 4/11/2012 9:36 AM, Olaf van der Spek wrote:

...

On Wed, Apr 11, 2012 at 5:18 PM, Eric Niebler <eric@boostpro.com> wrote:

...
...
You have to construct a string somewhere, don't you? If you have one already, you could use iterator_range<const char*> instead to avoid a copy.

Yes, I see that lexical_cast has optimizations for iterator_range<wchar_t const *> and a few other, sufficiently 'string-like' types. But sub_match essentially *is* a string-like iterator_range. (It's a std::pair of iterators.) I'm genuinely surprised

Is it? Does it have begin() and end() for example? Why not use std::iterator_range instead of std::pair?

Because the C++ standard says sub_match should inherit from std::pair. But does it matter? There are other 3rd party types that I'm sure users would like to adapt to lexical_cast. The docs just say a type needs a stream insertion operator. Shouldn't that be sufficient? Besides, using std::iterator_range would not solve the problem because lexical_cast only knows about boost::iterator_range. Do you see?

...

...
there's no way to tell lexical_cast that. Instead I have to just know (a) which are the magical types lexical_cast is optimized for and (b) for which it can determine the correct underlying stream character type (hint: the docs are unclear or out of date), and massage my type into one of those before calling lexical_cast. Why?

How could it automatically determine the necessary character type?

It can't be done automatically, but lexical_cast can expose the stream_char trait and make it a documented part of the interface. Users can specialize it for their types. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Antony Polukhin

5:57 p.m.

2012/4/11 Eric Niebler <eric@boostpro.com>:

...

Calling m.str() creates a temporary std::wstring object, which incurs a dynamic allocation and is slow. Speaking for my own library (xpressive), sub_match has an optimized stream insertion operator. It should be used.

sub_match ostream operator is defined as: std::ostream_iterator<char_type, Char, Traits> iout(sout); // where sout is std::basic_ostream<Char, Traits> std::copy(sub.first, sub.second, iout); std::basic_ostream<Char, Traits> on uses std::basic_string, so copying ddata to it can incur a multiple dynamic allocations and multiple copings of data. 2012/4/11 Eric Niebler <eric@boostpro.com>:

...

Olaf van der Spek wrote:

...
How could it automatically determine the necessary character type?

It can't be done automatically, but lexical_cast can expose the stream_char trait and make it a documented part of the interface. Users can specialize it for their types.

I`ll try to restore boost::detail::stream_char functionality, but I won't try hard. That is not a nice solution. I`ll also remove notes about adding lexical_cast specializations., because it can break some code, that uses &boost::lexical_cast<Dst, Src>; But that is not an optimal solution. Optimal solution would look like (not tested, required just to get the idea): template <class Target, class Source> Target lexical_cast_ext(Source&& s) { Target t; if (!::boost::try_lexical_cast(t, std::forward<Source>(s))) // No ADL BOOST_LCAST_THROW_BAD_CAST(Source, Target); return t; // RVO must be applied by compiler } template <class Target, class Source> bool try_lexical_cast_ext(Target& t, Source&& s) noexcept { using namespace boost::detail; // For getting default `construct_lexical_cast_in_trait' and `construct_lexical_cast_out_trait' const auto& in = construct_lexical_cast_in_trait(std::forward<Source>(s)); // ADL if (in.fail()) return false; typedef boost::mpl::identity<Target> target_tag; // Must have up to 4 construct_lexical_cast_out_trait // {with parameters const char*, const wchar_t*, const char16_t*, const char32_t*} return construct_lexical_cast_out_trait(target_tag(), in.begin(), in.end()) // ADL .assign_value(t); // returns true if conversion is OK } User will need to add one `construct_lexical_cast_in_trait' and up to four `construct_lexical_cast_out_trait' functions to the namespace of user-defined class. This will allow us to : * get correct function pointers via &lexical_cast<Source, Target>; (lexical_cast overloading solution breaks that) * use stream operators << and >> in default lexical_cast traits * tune lexical_cast for fast conversions to string types(deduce_char_traits<> specializing solution does not solve that) * tune lexical_cast for fast conversions from string types(both solutions do not solve that) * convert from any type to any type through usage of any character type (both solutions do not solve that) * relax all the type requirements (current implementation does not allow that) * have a noexcept version (current implementation does not allow that) Any objections? Did I miss something except conversion tags for specifying base? But this looks more like a NEW lexical_cast library. It must be implemented and reviewed. I remember all the unsuccessful attempts to implement new conversion libraries, so until there is no huge interest in new lexical_cast library I will not give it a try. All other solutions have drawbacks and there will be always someone who needs more. Conclusion: If a higher degree of control is required over conversions std::stringstream and std::wstringstream must be used. Or find lots of people, interested in NEW lexical_cast library. -- Best regards, Antony Polukhin

Mathias Gaunard

12 Apr 12 Apr

11:19 a.m.

On 11/04/12 19:57, Antony Polukhin wrote:

...

But that is not an optimal solution. Optimal solution would look like (not tested, required just to get the idea):

template<class Target, class Source>

Target lexical_cast_ext(Source&& s) {

Target t;

if (!::boost::try_lexical_cast(t, std::forward<Source>(s))) // No ADL BOOST_LCAST_THROW_BAD_CAST(Source, Target);

return t; // RVO must be applied by compiler

}

template<class Target, class Source>

bool try_lexical_cast_ext(Target& t, Source&& s) noexcept {

using namespace boost::detail; // For getting default `construct_lexical_cast_in_trait' and `construct_lexical_cast_out_trait'

const auto& in = construct_lexical_cast_in_trait(std::forward<Source>(s)); // ADL

if (in.fail()) return false;

typedef boost::mpl::identity<Target> target_tag;

// Must have up to 4 construct_lexical_cast_out_trait // {with parameters const char*, const wchar_t*, const char16_t*, const char32_t*}

return construct_lexical_cast_out_trait(target_tag(), in.begin(), in.end()) // ADL

.assign_value(t); // returns true if conversion is OK

}

User will need to add one `construct_lexical_cast_in_trait' and up to four `construct_lexical_cast_out_trait' functions to the namespace of user-defined class.

This will allow us to :

* get correct function pointers via&lexical_cast<Source, Target>; (lexical_cast overloading solution breaks that)

* use stream operators<< and>> in default lexical_cast traits

* tune lexical_cast for fast conversions to string types(deduce_char_traits<> specializing solution does not solve that)

* tune lexical_cast for fast conversions from string types(both solutions do not solve that)

* convert from any type to any type through usage of any character type (both solutions do not solve that)

* relax all the type requirements (current implementation does not allow that)

* have a noexcept version (current implementation does not allow that)

Any objections? Did I miss something except conversion tags for specifying base?

Why is lexical_cast so complicated and full and quirks? All that stuff looks messy and over-engineered.

Antony Polukhin

12:24 p.m.

12.04.2012 15:20 "Mathias Gaunard" <mathias.gaunard@ens-lyon.org>:

...

Why is lexical_cast so complicated and full and quirks?

All that stuff looks messy and over-engineered.

That was an example of lexical_cast with all the wishes implemented. Current implementation is simple, and user only required to provide stream operators << and >> to work with it. -- Best regards, Antony Polukhin

Mathias Gaunard

2:22 p.m.

On 12/04/12 14:24, Antony Polukhin wrote:

...

12.04.2012 15:20 "Mathias Gaunard"<mathias.gaunard@ens-lyon.org>:

...
Why is lexical_cast so complicated and full and quirks?

All that stuff looks messy and over-engineered.

That was an example of lexical_cast with all the wishes implemented.

Current implementation is simple, and user only required to provide stream operators<< and>> to work with it.

Yes, but apparently, that doesn't work with user-defined wide string types. Having to implement three traits that each must provide conversion facilities to all of char*, wchar_t*, char16_t* and char32_t* seems a bit much. Surely there must be a simpler solution.

Antony Polukhin

4:14 p.m.

2012/4/12 Mathias Gaunard <mathias.gaunard@ens-lyon.org>:

...

On 12/04/12 14:24, Antony Polukhin wrote:

...
Current implementation is simple, and user only required to provide stream operators<< and>> to work with it.

Yes, but apparently, that doesn't work with user-defined wide string types.

Having to implement three traits that each must provide conversion facilities to all of char*, wchar_t*, char16_t* and char32_t* seems a bit much.

Surely there must be a simpler solution.

First (and simpliest) solution is: leave everything as is. If lexical_cast fails - use wstream. Second solution: allow user to specialize stream_char<>. Other solutions will add much more complications, introduce ugly conversion API. Specializing stream_char<> does not solve all the problems. Noexcept version of lexical cast, fast user defined conversions are still impossible. There is one more solution: determinate stream_char<>::type for UserType using lots of meta-programming. For that solution following meta-functions required: has_output_stream_operator_for_char<UserType>::value has_output_stream_operator_for_wchar_t<UserType>::value has_output_stream_operator_for_char16_t<UserType>::value has_output_stream_operator_for_char32_t<UserType>::value Is there any ideas, how that can be done in a *portable* way? -- Best regards, Antony Polukhin

Nathan Ridge

7:41 p.m.

...

There is one more solution: determinate stream_char<>::type for UserType using lots of meta-programming. For that solution following meta-functions required: has_output_stream_operator_for_char<UserType>::value has_output_stream_operator_for_wchar_t<UserType>::value has_output_stream_operator_for_char16_t<UserType>::value has_output_stream_operator_for_char32_t<UserType>::value

Is there any ideas, how that can be done in a *portable* way?

Perhaps Boost.TTI can be of help? Regards, Nate

Antony Polukhin

13 Apr 13 Apr

4:15 a.m.

2012/4/12 Nathan Ridge <zeratul976@hotmail.com>:

...

...
There is one more solution: determinate stream_char<>::type for UserType using lots of meta-programming. For that solution following meta-functions required: has_output_stream_operator_for_char<UserType>::value has_output_stream_operator_for_wchar_t<UserType>::value has_output_stream_operator_for_char16_t<UserType>::value has_output_stream_operator_for_char32_t<UserType>::value

Is there any ideas, how that can be done in a *portable* way?

Perhaps Boost.TTI can be of help?

Boost.TypeTraits looks like a good solution. Using has_left_shift<> and has_right_shift<> it is possible to create required meta-functions. Created ticket #6786 -- Best regards, Antony Polukhin

Olaf van der Spek

11 Apr 11 Apr

6:07 p.m.

On Wed, Apr 11, 2012 at 7:12 PM, Eric Niebler <eric@boostpro.com> wrote:

...

On 4/11/2012 9:36 AM, Olaf van der Spek wrote:

...
On Wed, Apr 11, 2012 at 5:18 PM, Eric Niebler <eric@boostpro.com> wrote:

...
...
You have to construct a string somewhere, don't you? If you have one already, you could use iterator_range<const char*> instead to avoid a copy.

Yes, I see that lexical_cast has optimizations for iterator_range<wchar_t const *> and a few other, sufficiently 'string-like' types. But sub_match essentially *is* a string-like iterator_range. (It's a std::pair of iterators.) I'm genuinely surprised

Is it? Does it have begin() and end() for example? Why not use std::iterator_range instead of std::pair?

Because the C++ standard says sub_match should inherit from std::pair. But does it matter? There are other 3rd party types that I'm sure users would like to adapt to lexical_cast. The docs just say a type needs a stream insertion operator. Shouldn't that be sufficient?

Not if you want to avoid copying the data.

...

Besides, using std::iterator_range would not solve the problem because lexical_cast only knows about boost::iterator_range. Do you see?

Actually, I don't think std::iterator_range exists. I probably meant boost::iterator_range.

...

...
...
there's no way to tell lexical_cast that. Instead I have to just know (a) which are the magical types lexical_cast is optimized for and (b) for which it can determine the correct underlying stream character type (hint: the docs are unclear or out of date), and massage my type into one of those before calling lexical_cast. Why?

How could it automatically determine the necessary character type?

It can't be done automatically, but lexical_cast can expose the stream_char trait and make it a documented part of the interface. Users can specialize it for their types.

Maybe, but lexical_cast isn't the only code that wants to consume string-like types. IMO this should be handled in a more general way. -- Olaf

4851

Age (days ago)

4853

Last active (days ago)

List overview

Download

14 comments

5 participants

participants (5)

Antony Polukhin
Eric Niebler
Mathias Gaunard
Nathan Ridge
Olaf van der Spek