construe_cast, call for interest and feedback

Hi, I'm proud to say I've tagged a version 0.2 of construe_cast some minutes ago. For those unfamiliar with the library (it's the first time I'm posting this to the boost mailinglist), it's a library which builds upon boost.spirit to create a cast-like operator aiming to provide an alternative to lexical_cast providing greater speed and flexibility. The current code can be found at http://github.com/VeXocide/construe_cast and should be compatible with boost version 1.42.0 and up. What can it do, in short, boost::construe_cast<int>("23") and boost::construe_cast<std::string>(23); (and a lot of other types, basically the ones boost.spirit supports). Secondly, since this release, boost::construe_cast<std::string, tag::bin>(23) resulting in a std::string("10111") and visa versa. The user has the ability to add functionality here by implementing extra tags, by default tag::bin, tag::oct and tag::hex are implemented, with more to follow. What I'm hoping to add the coming weeks: - Documentation, I'm somewhat ashamed to say this hasn't been written yet, however, this will be tackled shortly. - Support for different encodings, it currently uses standard (ASCII) and standard_wide for wchar_t, with more to follow. - A nothow version, like lexical_cast it currently throws when it encounters a run-time, an interface returning false or a default value will be included (the exact interface hasn't been decided upon yet). Performance, now this is where it gets hard. I've done some basic benchmarking and it's somewhere between one and five times as fast as lexical_cast, depending on what you're casting. Good compilers (I've notices this with gcc 4.4 and up for example) are actually capable of optimizing boost::construe_cast<int>("23") to the integer 23 at compile-time which makes it tricky to benchmark. If anyone with more benchmarking experience is willing to look into this, please do. Again, thanks to everyone who helped me get this far, and Hartmut in particular. There seems to be little to add at the moment, I'm looking forward to your opinions and insights. Regards, Jeroen Habraken P.S. I've had a few responses about the name, construe is an english word (http://www.thefreedictionary.com/construe) and I'm using construe_cast as a working name. If people are ever going to consider this for inclusion with boost, changing the name will surely be the least of my worries.

On 2010-10-07 19:38, Jeroen Habraken wrote:
Hi,
I'm proud to say I've tagged a version 0.2 of construe_cast some minutes ago. For those unfamiliar with the library (it's the first time I'm posting this to the boost mailinglist), it's a library which builds upon boost.spirit to create a cast-like operator aiming to provide an alternative to lexical_cast providing greater speed and flexibility.
[snip] I'm interested. There have been book-length discussions on String Convert by Vladimir Botov, currently in the review queue. Have you looked at this? See http://www.boostpro.com/vault/index.php?action=downloadfile&filename=boost-string-convert.zip In what way does your work differ? Cheers, Rutger

On 7 October 2010 20:16, Rutger ter Borg <rutger@terborg.net> wrote:
On 2010-10-07 19:38, Jeroen Habraken wrote:
[snip]
I'm interested. There have been book-length discussions on String Convert by Vladimir Botov, currently in the review queue. Have you looked at this? See
http://www.boostpro.com/vault/index.php?action=downloadfile&filename=boost-string-convert.zip
Yes, I'm aware, and have taken some inspiration from it.
In what way does your work differ?
boost.convert still bases its conversions on streams just as lexical_cast does, which are relatively slow. This library on the other hand use boost.spirit its auto functionality to parse and generate strings.
Cheers,
Rutger
Jeroen

It would be nice to have construe_cast(lhs, rhs); so that there is no temporary copy for things like std::string on the lhs and type inferencing can be used to avoid writing the type name. Also are you adding array of char as an input specialization? I like the way it avoids the strlen for const char literals on the rhs. Regards, Matt ------ On 08/10/2010, at 4:38, Jeroen Habraken <vexocide@gmail.com> wrote:
Hi,
I'm proud to say I've tagged a version 0.2 of construe_cast some minutes ago. For those unfamiliar with the library (it's the first time I'm posting this to the boost mailinglist), it's a library which builds upon boost.spirit to create a cast-like operator aiming to provide an alternative to lexical_cast providing greater speed and flexibility.
The current code can be found at http://github.com/VeXocide/construe_cast and should be compatible with boost version 1.42.0 and up.
What can it do, in short, boost::construe_cast<int>("23") and boost::construe_cast<std::string>(23); (and a lot of other types, basically the ones boost.spirit supports). Secondly, since this release, boost::construe_cast<std::string, tag::bin>(23) resulting in a std::string("10111") and visa versa. The user has the ability to add functionality here by implementing extra tags, by default tag::bin, tag::oct and tag::hex are implemented, with more to follow.
What I'm hoping to add the coming weeks: - Documentation, I'm somewhat ashamed to say this hasn't been written yet, however, this will be tackled shortly. - Support for different encodings, it currently uses standard (ASCII) and standard_wide for wchar_t, with more to follow. - A nothow version, like lexical_cast it currently throws when it encounters a run-time, an interface returning false or a default value will be included (the exact interface hasn't been decided upon yet).
Performance, now this is where it gets hard. I've done some basic benchmarking and it's somewhere between one and five times as fast as lexical_cast, depending on what you're casting. Good compilers (I've notices this with gcc 4.4 and up for example) are actually capable of optimizing boost::construe_cast<int>("23") to the integer 23 at compile-time which makes it tricky to benchmark. If anyone with more benchmarking experience is willing to look into this, please do.
Again, thanks to everyone who helped me get this far, and Hartmut in particular. There seems to be little to add at the moment, I'm looking forward to your opinions and insights.
Regards, Jeroen Habraken
P.S. I've had a few responses about the name, construe is an english word (http://www.thefreedictionary.com/construe) and I'm using construe_cast as a working name. If people are ever going to consider this for inclusion with boost, changing the name will surely be the least of my worries. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 7 October 2010 22:59, Matthew Herrmann <matthew.herrmann@zomojo.com> wrote:
It would be nice to have
construe_cast(lhs, rhs);
so that there is no temporary copy for things like std::string on the lhs and type inferencing can be used to avoid writing the type name.
This is indeed a feature I wish to add (possibly with a somewhat different interface).
Also are you adding array of char as an input specialization? I like the way it avoids the strlen for const char literals on the rhs.
Yes I am, the specific specializations can be found at http://github.com/VeXocide/construe_cast/blob/master/boost/construe/iterable... lines 120 to 165
Regards,
Matt ------
Jeroen

Jeroen Habraken <vexocide <at> gmail.com> writes:
On 7 October 2010 22:59, Matthew Herrmann <matthew.herrmann <at> zomojo.com>
wrote:
Also are you adding array of char as an input specialization? I like the way it avoids the strlen for const char literals on the rhs.
Yes I am, the specific specializations can be found at
http://github.com/VeXocide/construe_cast/blob/master/boost/construe/iterable... p
lines 120 to 165
The fixed-size array-of-char specializations seem like a good idea, but (the user) can fall into the same trap as the (v3) filesystem constructor does (trac #4640), e.g.: char buffer[100]; strcpy(buffer, "100"); construe_cast<int>(buffer); Depending on how the size (100) is taken into account, this may or may not be an issue.

On 11 October 2010 10:34, Richard Hazlewood <boost@hazlewoods.eu> wrote:
Jeroen Habraken <vexocide <at> gmail.com> writes:
On 7 October 2010 22:59, Matthew Herrmann <matthew.herrmann <at> zomojo.com>
wrote:
Also are you adding array of char as an input specialization? I like the way it avoids the strlen for const char literals on the rhs.
Yes I am, the specific specializations can be found at
http://github.com/VeXocide/construe_cast/blob/master/boost/construe/iterable... p
lines 120 to 165
The fixed-size array-of-char specializations seem like a good idea, but (the user) can fall into the same trap as the (v3) filesystem constructor does (trac #4640), e.g.:
char buffer[100]; strcpy(buffer, "100"); construe_cast<int>(buffer);
Depending on how the size (100) is taken into account, this may or may not be an issue.
Since the string is null-terminated this is not a problem. If for instance you cast "123\0456" to an int it will succeed and return 123, as far as I'm aware this behaviour is similar to lexical_cast. Jeroen

On 07/10/2010 22:04, Jeroen Habraken wrote:
On 7 October 2010 22:59, Matthew Herrmann<matthew.herrmann@zomojo.com> wrote:
It would be nice to have
construe_cast(lhs, rhs);
so that there is no temporary copy for things like std::string on the lhs and type inferencing can be used to avoid writing the type name.
This is indeed a feature I wish to add (possibly with a somewhat different interface).
Also are you adding array of char as an input specialization? I like the way it avoids the strlen for const char literals on the rhs.
Yes I am, the specific specializations can be found at http://github.com/VeXocide/construe_cast/blob/master/boost/construe/iterable... lines 120 to 165
If the function is inlined a call to strlen will probably be evaluated at compile-time anyway.

It would be nice to have
construe_cast(lhs, rhs);
so that there is no temporary copy for things like std::string on the lhs and type inferencing can be used to avoid writing the type name.
I'd like to see some measurements first as I'm not sure if cluttering the API for the sake of premature optimization is a good idea. RVO/NRVO and move semantics should do a very good job at making the proposed API efficient enough for all intents and purposes. Regards Hartmut --------------- http://boost-spirit.com
Regards,
Matt ------
On 08/10/2010, at 4:38, Jeroen Habraken <vexocide@gmail.com> wrote:
Hi,
I'm proud to say I've tagged a version 0.2 of construe_cast some minutes ago. For those unfamiliar with the library (it's the first time I'm posting this to the boost mailinglist), it's a library which builds upon boost.spirit to create a cast-like operator aiming to provide an alternative to lexical_cast providing greater speed and flexibility.
The current code can be found at http://github.com/VeXocide/construe_cast and should be compatible with boost version 1.42.0 and up.
What can it do, in short, boost::construe_cast<int>("23") and boost::construe_cast<std::string>(23); (and a lot of other types, basically the ones boost.spirit supports). Secondly, since this release, boost::construe_cast<std::string, tag::bin>(23) resulting in a std::string("10111") and visa versa. The user has the ability to add functionality here by implementing extra tags, by default tag::bin, tag::oct and tag::hex are implemented, with more to follow.
What I'm hoping to add the coming weeks: - Documentation, I'm somewhat ashamed to say this hasn't been written yet, however, this will be tackled shortly. - Support for different encodings, it currently uses standard (ASCII) and standard_wide for wchar_t, with more to follow. - A nothow version, like lexical_cast it currently throws when it encounters a run-time, an interface returning false or a default value will be included (the exact interface hasn't been decided upon yet).
Performance, now this is where it gets hard. I've done some basic benchmarking and it's somewhere between one and five times as fast as lexical_cast, depending on what you're casting. Good compilers (I've notices this with gcc 4.4 and up for example) are actually capable of optimizing boost::construe_cast<int>("23") to the integer 23 at compile-time which makes it tricky to benchmark. If anyone with more benchmarking experience is willing to look into this, please do.
Again, thanks to everyone who helped me get this far, and Hartmut in particular. There seems to be little to add at the moment, I'm looking forward to your opinions and insights.
Regards, Jeroen Habraken
P.S. I've had a few responses about the name, construe is an english word (http://www.thefreedictionary.com/construe) and I'm using construe_cast as a working name. If people are ever going to consider this for inclusion with boost, changing the name will surely be the least of my worries. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Secondly, since this release, boost::construe_cast<std::string, tag::bin>(23) resulting in a std::string("10111") and visa versa. The user has the ability to add functionality here by implementing extra tags, by default tag::bin, tag::oct and tag::hex are implemented, with more to follow.
I use boost::lexical_cast for conversions from string. For conversions to string I rarely use boost::lexical_cast. Most of the time I need formated conversions, and I use boost::format: lexical_cast<int>( "12" ) lexical_cast<string>( 12 ) str( format("0x%04x") % 12 ) When I need control over output format, I usually need more then just tag::hex, etc., and I like the printf/boost::format syntax for it. The alternative I dislike: ( ostringstream() << "0x" << hex << setw(4) << setfill('0') << 12 ).str() Once I experienced having to fall back from boost::format to sprintf for efficiency reasons. Conclusions: 1. For conversions from string I consider lexical_cast<T>( string ) sufficient, but I can imagine efficiency can be improved - if mesurements say so. 2. For unformated conversions to string: same as point 1. 3. For formated conversions to string: a more efficient boost::format would be something I would like to see (just a user's point of view, I have no idea how this could be implemented). Regards Kris

On 8 October 2010 14:18, Krzysztof Czainski <1czajnik@gmail.com> wrote:
Secondly, since this release, boost::construe_cast<std::string, tag::bin>(23) resulting in a std::string("10111") and visa versa. The user has the ability to add functionality here by implementing extra tags, by default tag::bin, tag::oct and tag::hex are implemented, with more to follow.
I use boost::lexical_cast for conversions from string. For conversions to string I rarely use boost::lexical_cast. Most of the time I need formated conversions, and I use boost::format:
lexical_cast<int>( "12" ) lexical_cast<string>( 12 ) str( format("0x%04x") % 12 )
When I need control over output format, I usually need more then just tag::hex, etc., and I like the printf/boost::format syntax for it. The alternative I dislike: ( ostringstream() << "0x" << hex << setw(4) << setfill('0') << 12 ).str()
Once I experienced having to fall back from boost::format to sprintf for efficiency reasons.
Conclusions: 1. For conversions from string I consider lexical_cast<T>( string ) sufficient, but I can imagine efficiency can be improved - if mesurements say so. 2. For unformated conversions to string: same as point 1.
I agree, that's the scope of this library. The tags are there as an extensibility point, and over time some common and convenient tags will be added, imagine a tag::no_case when you want to accept "True" or "TRUE" besides "true" for example.
3. For formated conversions to string: a more efficient boost::format would be something I would like to see (just a user's point of view, I have no idea how this could be implemented).
There already is a library in boost capable of this, boost.spirit's Karma, which construe_cast is partly build upon. It's not as trivial to use as boost.format but considerable faster, and I can highly recommend it. Please see http://www.boost.org/doc/libs/1_44_0/libs/spirit/doc/html/index.html for documentation.
Regards Kris
Regards, Jeroen

On Fri, Oct 8, 2010 at 1:38 AM, Jeroen Habraken <vexocide@gmail.com> wrote:
Hi,
I'm proud to say I've tagged a version 0.2 of construe_cast some minutes ago. For those unfamiliar with the library (it's the first time I'm posting this to the boost mailinglist), it's a library which builds upon boost.spirit to create a cast-like operator aiming to provide an alternative to lexical_cast providing greater speed and flexibility.
I'm definitely interested and I look forward to seeing this completed to fruition. I think it's about time we had a better value casting utility in C++. Keep the great work coming Jeroen! :) -- Dean Michael Berris deanberris.com

On Thu, Oct 7, 2010 at 6:38 PM, Jeroen Habraken <vexocide@gmail.com> wrote:
Hi,
I'm proud to say I've tagged a version 0.2 of construe_cast some minutes ago. For those unfamiliar with the library (it's the first time I'm posting this to the boost mailinglist), it's a library which builds upon boost.spirit to create a cast-like operator aiming to provide an alternative to lexical_cast providing greater speed and flexibility.
Rather than introducing another casting method, could lexical_cast be reworked internally to build upon Spirit? That way previous investments in lexical_cast will just work faster with no changes, Also, I won't need to stop and think which one I should be using. The other comment I have is about compilation times with Spirit. Would a client of construe_cast be pulling in the Spirit headers? If so, my compiler (VC9/VS2008) would take a lot longer to build the compilation unit. I don't mind this overhead for source files that are using Spriit to do non-trivial parsing, but for casts to/from ints and strings which are very common, I wouldn't want to wait for such long compiles. Regards, Pete

On 09/10/2010 13:42, PB wrote:
Rather than introducing another casting method, could lexical_cast be reworked internally to build upon Spirit? That way previous investments in lexical_cast will just work faster with no changes, Also, I won't need to stop and think which one I should be using.
The problem is that lexical_cast is a bit different since it takes into account C++ locales.

On 9 October 2010 14:50, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 09/10/2010 13:42, PB wrote:
Rather than introducing another casting method, could lexical_cast be reworked internally to build upon Spirit? That way previous investments in lexical_cast will just work faster with no changes, Also, I won't need to stop and think which one I should be using.
The problem is that lexical_cast is a bit different since it takes into account C++ locales.
Yes, this was bound to be brought up. There is no way I can implement them, not in the short run anyways, yet I don't believe this to be a major problem as locales are slow. The library is designed to be extensible and if you want numbers to be formatted in a specific way, this should be possible. Jeroen

On Sat, Oct 9, 2010 at 4:26 PM, Jeroen Habraken <vexocide@gmail.com> wrote:
On 9 October 2010 14:50, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 09/10/2010 13:42, PB wrote:
Rather than introducing another casting method, could lexical_cast be reworked internally to build upon Spirit? That way previous investments in lexical_cast will just work faster with no changes, Also, I won't need to stop and think which one I should be using.
The problem is that lexical_cast is a bit different since it takes into account C++ locales.
Yes, this was bound to be brought up. There is no way I can implement them, not in the short run anyways, yet I don't believe this to be a major problem as locales are slow. The library is designed to be extensible and if you want numbers to be formatted in a specific way, this should be possible.
Jeroen, I now understand a bit more the difference between lexical_cast and construe_cast and agree with Vincente that it'd be good to have the choice between runtime/compile improvements. Thanks, Pete

On 09/10/2010 13:50, Mathias Gaunard wrote:
On 09/10/2010 13:42, PB wrote:
Rather than introducing another casting method, could lexical_cast be reworked internally to build upon Spirit? That way previous investments in lexical_cast will just work faster with no changes, Also, I won't need to stop and think which one I should be using.
The problem is that lexical_cast is a bit different since it takes into account C++ locales.
I think the important question is the following: Do we care about C++ locales support in lexical_cast? Are people fine scrapping lexical_cast and have construe_cast take its name?

On 10/11/2010 01:23 PM, Mathias Gaunard wrote:
I think the important question is the following:
Do we care about C++ locales support in lexical_cast?
Not I. If I could get std::string and std::stream without locales, I'd use them. (Even better might be an STL without allocators, but that's a different question.)
Are people fine scrapping lexical_cast and have construe_cast take its name?
I'm not a heavy user of lexical_cast, but want to echo the concern about a potential increase in build times. I attempted a small project with Spirit once and the compile time was noticeable. While it was acceptable for a translation unit producing a high-tech parser, I wouldn't want that overhead creeping into everything through some seemingly innocuous string conversion functions. - Marsh

On 11/10/2010 22:08, Marsh Ray wrote:
On 10/11/2010 01:23 PM, Mathias Gaunard wrote:
I think the important question is the following:
Do we care about C++ locales support in lexical_cast?
Not I. If I could get std::string and std::stream without locales, I'd use them. (Even better might be an STL without allocators, but that's a different question.)
Are people fine scrapping lexical_cast and have construe_cast take its name?
I'm not a heavy user of lexical_cast, but want to echo the concern about a potential increase in build times.
I attempted a small project with Spirit once and the compile time was noticeable. While it was acceptable for a translation unit producing a high-tech parser, I wouldn't want that overhead creeping into everything through some seemingly innocuous string conversion functions.
I suppose it would be less of a problem if precompiled headers usage was generalized. The template definition could be exported, too, preventing the instantiations altogether.

----- Original Message ----- From: "PB" <newbarker@gmail.com> To: <boost@lists.boost.org> Sent: Saturday, October 09, 2010 2:42 PM Subject: Re: [boost] construe_cast, call for interest and feedback
Rather than introducing another casting method, could lexical_cast be reworked internally to build upon Spirit? That way previous investments in lexical_cast will just work faster with no changes, Also, I won't need to stop and think which one I should be using.
The other comment I have is about compilation times with Spirit. Would a client of construe_cast be pulling in the Spirit headers? If so, my compiler (VC9/VS2008) would take a lot longer to build the compilation unit. I don't mind this overhead for source files that are using Spriit to do non-trivial parsing, but for casts to/from ints and strings which are very common, I wouldn't want to wait for such long compiles.
I think this is the major advantage of making a diiferent class, you will be able to choose run-time versus compile-time improvements. Vicente

On 9 October 2010 14:42, PB <newbarker@gmail.com> wrote:
Rather than introducing another casting method, could lexical_cast be reworked internally to build upon Spirit? That way previous investments in lexical_cast will just work faster with no changes, Also, I won't need to stop and think which one I should be using.
This is something for the boost community to decide when it's time for it to be reviewed. I don't know if I'm able to create something that's fully backwards compatible with lexical_cast (probably not).
The other comment I have is about compilation times with Spirit. Would a client of construe_cast be pulling in the Spirit headers? If so, my compiler (VC9/VS2008) would take a lot longer to build the compilation unit. I don't mind this overhead for source files that are using Spriit to do non-trivial parsing, but for casts to/from ints and strings which are very common, I wouldn't want to wait for such long compiles.
Compiling a simple example takes about 3.7 seconds on my machine with gcc 4.5.1, and Adam Merz tested it with VS2010 and was pleasantly surprised by the time it took to compile, he expected worse (I'm afraid I don't have any real numbers, I currently don't have a Windows machine around). Whether or not the runtime gains are worth the longer compilation times is ultimately for you to decided.
Regards,
Pete
Jeroen
participants (11)
-
Dean Michael Berris
-
Hartmut Kaiser
-
Jeroen Habraken
-
Krzysztof Czainski
-
Marsh Ray
-
Mathias Gaunard
-
Matthew Herrmann
-
PB
-
Richard Hazlewood
-
Rutger ter Borg
-
vicente.botet