lexical_cast optimization

Someone was asking me today about the inefficiency of lexical_cast for its common itoa/atoi-like usage, and it occurred to me that it could *easily* be optimized to handle the common cases by dispatching to itoa/atoi ftoa/atof where available, and even sprintf. Seems like a great idea to me. Thoughts? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

2008/12/3 David Abrahams <dave@boostpro.com>
You may find the following interesting: http://lists.boost.org/Archives/boost/2007/11/131001.php I'd quite like to know if Johan got these changes into the library. Any news on this? Regards, Darren

Darren Garvey wrote:
Please do not proceed at the expense of error handling. I believe using itoa and ftoa is fine, but using atoi and atof is definitely not: cout << atoi("123X") prints 123 cout << atoi("X") prints 0 lexical_cast<int> throws in both cases, which I think is 100% the correct behaviour. --> Mika Heiskanen

Darren Garvey skrev:
I haven't looked at the implementation, but I know Matthew Wilson had a series of articles in CUJ about fast string to integer conversions. IIRC, he was doing much better than the functions from the C library: http://www.ddj.com/cpp/184403874 I guess it might be possible to optimize the floating point casts too, albeit it might be somewhat harder. -Thorsten

This implementation of itoa is rather fast and open-source, based on a lookup up table of groups of four digits: https://sourceforge.net/projects/itoa "Fast implementation of the old deprecated itoa(), lltoa() etc... plus some C++ wrapping. Theoretically controversed but practically very useful." The source is visible here, with a humble attempt to integrate it into Boost : http://itoa.cvs.sourceforge.net/viewvc/itoa/itoa/ Thorsten Ottosen wrote:

we have found that atoi/strtol/strtoll/strtod are slow compared to the optimized output (gcc -O2/3) of a very simple * 10 loop. The C number converters work for a large varity of bases, which makes them slower than needed for base 10. we use something like: template<typename TInteger> bool validated_to_unsigned_int( const char* str_value, size_t size, TInteger& result) { size_t i = 0; for(const char* p = str_value; i < size; ++p, ++i) { char digit = *p; if (digit >= '0' && digit <='9') { result = result * 10 + digit - '0'; } else { return false; } } return true; } with wrappers for signed and ones that throw exceptions. We have a few converters that are faster, but only without error checking (base 16 conversion with lookup table works for small numbers, for example)

David Abrahams <dave <at> boostpro.com> writes:
This can be easily done only if a program always uses the classic C and C++ locales. Optimization for some combinations of types is already available if you set BOOST_LEXICAL_CAST_ASSUME_C_LOCALE. Mapping between C and C++ locales is hard, if possible at all in a portable way. -- Alex
participants (9)
-
Alexander Nasonov
-
Darren Garvey
-
David Abrahams
-
Jonathan Brannan
-
Mika Heiskanen
-
Othman, Ossama
-
remi.chateauneu@gmx.de
-
Steven Watanabe
-
Thorsten Ottosen