lexical_cast optimization

Someone was asking me today about the inefficiency of lexical_cast for its common itoa/atoi-like usage, and it occurred to me that it could *easily* be optimized to handle the common cases by dispatching to itoa/atoi ftoa/atof where available, and even sprintf. Seems like a great idea to me. Thoughts? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

AMDG David Abrahams wrote:
Someone was asking me today about the inefficiency of lexical_cast for its common itoa/atoi-like usage, and it occurred to me that it could *easily* be optimized to handle the common cases by dispatching to itoa/atoi ftoa/atof where available, and even sprintf. Seems like a great idea to me.
I think that lexical_cast already does something similar. In Christ, Steven Watanabe

2008/12/3 David Abrahams <dave@boostpro.com>
Someone was asking me today about the inefficiency of lexical_cast for its common itoa/atoi-like usage, and it occurred to me that it could *easily* be optimized to handle the common cases by dispatching to itoa/atoi ftoa/atof where available, and even sprintf. Seems like a great idea to me.
Thoughts?
You may find the following interesting: http://lists.boost.org/Archives/boost/2007/11/131001.php I'd quite like to know if Johan got these changes into the library. Any news on this? Regards, Darren

Darren Garvey wrote:
2008/12/3 David Abrahams <dave@boostpro.com>
Someone was asking me today about the inefficiency of lexical_cast for its common itoa/atoi-like usage, and it occurred to me that it could *easily* be optimized to handle the common cases by dispatching to itoa/atoi ftoa/atof where available, and even sprintf. Seems like a great idea to me.
You may find the following interesting:
http://lists.boost.org/Archives/boost/2007/11/131001.php
I'd quite like to know if Johan got these changes into the library. Any news on this?
Please do not proceed at the expense of error handling. I believe using itoa and ftoa is fine, but using atoi and atof is definitely not: cout << atoi("123X") prints 123 cout << atoi("X") prints 0 lexical_cast<int> throws in both cases, which I think is 100% the correct behaviour. --> Mika Heiskanen

Hi,
Please do not proceed at the expense of error handling. I believe using itoa and ftoa is fine, but using atoi and atof is definitely not:
cout << atoi("123X") prints 123 cout << atoi("X") prints 0
lexical_cast<int> throws in both cases, which I think is 100% the correct behaviour.
Would strtol() and friends be suitable from a performance point of view? They at least provide some error reporting. -Ossama

Darren Garvey skrev:
2008/12/3 David Abrahams <dave@boostpro.com>
Someone was asking me today about the inefficiency of lexical_cast for its common itoa/atoi-like usage, and it occurred to me that it could *easily* be optimized to handle the common cases by dispatching to itoa/atoi ftoa/atof where available, and even sprintf. Seems like a great idea to me.
Thoughts?
You may find the following interesting:
http://lists.boost.org/Archives/boost/2007/11/131001.php
I'd quite like to know if Johan got these changes into the library. Any news on this?
I haven't looked at the implementation, but I know Matthew Wilson had a series of articles in CUJ about fast string to integer conversions. IIRC, he was doing much better than the functions from the C library: http://www.ddj.com/cpp/184403874 I guess it might be possible to optimize the floating point casts too, albeit it might be somewhat harder. -Thorsten

This implementation of itoa is rather fast and open-source, based on a lookup up table of groups of four digits: https://sourceforge.net/projects/itoa "Fast implementation of the old deprecated itoa(), lltoa() etc... plus some C++ wrapping. Theoretically controversed but practically very useful." The source is visible here, with a humble attempt to integrate it into Boost : http://itoa.cvs.sourceforge.net/viewvc/itoa/itoa/ Thorsten Ottosen wrote:
Darren Garvey skrev:
2008/12/3 David Abrahams <dave@boostpro.com>
Someone was asking me today about the inefficiency of lexical_cast for its common itoa/atoi-like usage, and it occurred to me that it could *easily* be optimized to handle the common cases by dispatching to itoa/atoi ftoa/atof where available, and even sprintf. Seems like a great idea to me.
Thoughts?
You may find the following interesting:
http://lists.boost.org/Archives/boost/2007/11/131001.php
I'd quite like to know if Johan got these changes into the library. Any news on this?
I haven't looked at the implementation, but I know Matthew Wilson had a series of articles in CUJ about fast string to integer conversions. IIRC, he was doing much better than the functions from the C library:
http://www.ddj.com/cpp/184403874
I guess it might be possible to optimize the floating point casts too, albeit it might be somewhat harder.
-Thorsten _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

we have found that atoi/strtol/strtoll/strtod are slow compared to the optimized output (gcc -O2/3) of a very simple * 10 loop. The C number converters work for a large varity of bases, which makes them slower than needed for base 10. we use something like: template<typename TInteger> bool validated_to_unsigned_int( const char* str_value, size_t size, TInteger& result) { size_t i = 0; for(const char* p = str_value; i < size; ++p, ++i) { char digit = *p; if (digit >= '0' && digit <='9') { result = result * 10 + digit - '0'; } else { return false; } } return true; } with wrappers for signed and ones that throw exceptions. We have a few converters that are faster, but only without error checking (base 16 conversion with lookup table works for small numbers, for example)

This is about 10% faster on a Pentium: { for(const char* p = str_value; size; ++p, --size) { Jonathan Brannan wrote:
we have found that atoi/strtol/strtoll/strtod are slow compared to the optimized output (gcc -O2/3) of a very simple * 10 loop. The C number converters work for a large varity of bases, which makes them slower than needed for base 10.
we use something like:
template<typename TInteger> bool validated_to_unsigned_int( const char* str_value, size_t size, TInteger& result) { size_t i = 0; for(const char* p = str_value; i < size; ++p, ++i) { char digit = *p; if (digit >= '0' && digit <='9') { result = result * 10 + digit - '0'; } else { return false; } } return true; }
with wrappers for signed and ones that throw exceptions. We have a few converters that are faster, but only without error checking (base 16 conversion with lookup table works for small numbers, for example) _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

David Abrahams <dave <at> boostpro.com> writes:
Someone was asking me today about the inefficiency of lexical_cast for its common itoa/atoi-like usage, and it occurred to me that it could *easily* be optimized to handle the common cases by dispatching to itoa/atoi ftoa/atof where available, and even sprintf. Seems like a great idea to me.
This can be easily done only if a program always uses the classic C and C++ locales. Optimization for some combinations of types is already available if you set BOOST_LEXICAL_CAST_ASSUME_C_LOCALE. Mapping between C and C++ locales is hard, if possible at all in a portable way. -- Alex
participants (9)
-
Alexander Nasonov
-
Darren Garvey
-
David Abrahams
-
Jonathan Brannan
-
Mika Heiskanen
-
Othman, Ossama
-
remi.chateauneu@gmx.de
-
Steven Watanabe
-
Thorsten Ottosen