[lexical_cast] optimization commited to HEAD

Hello, I recently commited the version 1.26 of lexical_cast.hpp that optimizes many combinations of types. How it works: The lexical_cast reserves a local buffer for bool, char, wchar_t and other integral types. Then it puts a string representation of a source to the buffer. Currently, all such algorithms don't use ostream at all but it's easy to implement a generic algorithm that sets a put area of a ostream object to point to that buffer and puts a string representation to the buffer. When a source type is a pointer to char/wchar_t array or basic_string<>, its value is already a string representation of itself and a call to ostream output operator is omitted. Futher optimization is applied when a target type is either char/wchar_t or basic_string<>. In this case, std streams are not involved in a conversion at all. For all other target types, basic_istream object is contructed, its get area is set to a string representation of a source type and the operator>> is called. Optimization for the following source types is ON: bool char wchar_t other integral types char*, char const*, char[], char const[] wchar_t*, wchar_t const*, wchar_t[], wchar_t const[] std::basic_string<> Optimization for the following source types is OFF: enums float double signed char unsigned char signed char*, signed char const*, signed char[], signed char const[] unsigned char*, unsigned char const*, unsigned char[], unsigned char const[] all other types not listed in ON list Tested on VC6, VC7.1, VC8, gcc 3.4 and Borland 5.5. I couldn't run the test on Borland so I tested only that it compiles. Diff: cvs diff -r 1.24 -r 1.26 boost/lexical_cast.hpp http://tinyurl.com/qu388 Performance: Compiled with gcc 3.4.4 on FreeBSD 6.1 with -O2 flag turned on. A coversion from an int value [0, 9] to char. With Boost 1.33.1 - 2.012 s With Boost HEAD - 0.322 s, 6.24 times faster Ignore locales - 0.083 s A coversion from an int value [0, 999999] to std::string. With Boost 1.33.1 - 2.516 s With Boost HEAD - 0.844 s, 2.98 times faster Ignore locales - 0.626 s TODO - Optimization for float and double. - Make MEASURE_LEXICAL_CAST_PERFORMANCE_WITHOUT_LOCALE_OVERHEAD a public configuration parameter. - Discuss a behavior of enums with user-defined output operator. -- Alexander Nasonov Project Manager at Akmosoft ( http://www.akmosoft.com ) Blog: http://nasonov.blogspot.com Email: $(FirstName) dot $(LastName) at gmail dot com

Alexander Nasonov wrote:
Hello, I recently commited the version 1.26 of lexical_cast.hpp that optimizes many combinations of types.
Cool, but can you please double check the line endings: after a cvs update yesterday lexical_cast wouldn't compile with vc8, because it complained about Mac line endings. Letting my IDE convert the line endings made matters even worse (all the multi-line macros broke!) If I remember correctly this problem occurs if you edit the file on windows and then commit from a linux box: the file ends up with \r\n line endings in cvs, which get translated to \r\r\n when updating from windows :-( Many thanks! John.

John Maddock wrote:
Cool, but can you please double check the line endings: after a cvs update yesterday lexical_cast wouldn't compile with vc8, because it complained about Mac line endings. Letting my IDE convert the line endings made matters even worse (all the multi-line macros broke!)
If I remember correctly this problem occurs if you edit the file on windows and then commit from a linux box: the file ends up with \r\n line endings in cvs, which get translated to \r\r\n when updating from windows :-( You're right except that it was FreeBSD ;)
-- Alexander Nasonov Project Manager at Akmosoft ( http://www.akmosoft.com ) Blog: http://nasonov.blogspot.com Email: $(FirstName) dot $(LastName) at gmail dot com

Alexander Nasonov said: (by the date of Sat, 26 Aug 2006 21:27:23 +0000)
Hello, I recently commited the version 1.26 of lexical_cast.hpp that optimizes many combinations of types.
How it works:
The lexical_cast reserves a local buffer for bool, char, wchar_t and other integral types.
great, but is this code thread safe? Think about several different threads calling lexical_cast<> simultaneously. Will this local buffer get mangled in such situation? -- Janek Kozicki |

Janek Kozicki wrote:
Alexander Nasonov said: (by the date of Sat, 26 Aug 2006 21:27:23 +0000)
Hello, I recently commited the version 1.26 of lexical_cast.hpp that optimizes many combinations of types.
How it works:
The lexical_cast reserves a local buffer for bool, char, wchar_t and other integral types.
great, but is this code thread safe? Think about several different threads calling lexical_cast<> simultaneously. Will this local buffer get mangled in such situation?
The current lexical_cast makes no claim of thread safety so why would you expect an optimization to? Jeff

Jeff Garland wrote:
The current lexical_cast makes no claim of thread safety so why would you expect an optimization to?
The default level of thread safety shall always be "basic", and since lexical_cast has no state, this means that it should be usable in a threaded program without explicit synchronization. Components wouldn't be able to use lexical_cast for fear of stepping on each other's toes otherwise.

Peter Dimov wrote:
Jeff Garland wrote:
The current lexical_cast makes no claim of thread safety so why would you expect an optimization to?
The default level of thread safety shall always be "basic", and since lexical_cast has no state, this means that it should be usable in a threaded program without explicit synchronization.
Components wouldn't be able to use lexical_cast for fear of stepping on each other's toes otherwise.
They can't. lexical_cast uses iostreams which has global data like the global locale and no guarantee of "thread safety" w.r.t to this global state. Jeff

Janek Kozicki wrote:
great, but is this code thread safe? Think about several different threads calling lexical_cast<> simultaneously. Will this local buffer get mangled in such situation?
local buffer == allocated on stack. It is thread safe. -- Alexander Nasonov Project Manager at Akmosoft ( http://www.akmosoft.com ) Blog: http://nasonov.blogspot.com Email: $(FirstName) dot $(LastName) at gmail dot com

That's not true. std::basic_string<T> is not thread safe. I spent couple months early this year to deal with that fact and finally came up with an alternative string class with the same interface as std::string but does not use the copy-on-write technique. On 8/27/06, Alexander Nasonov <alnsn@yandex.ru> wrote:
Janek Kozicki wrote:
great, but is this code thread safe? Think about several different threads calling lexical_cast<> simultaneously. Will this local buffer get mangled in such situation?
local buffer == allocated on stack. It is thread safe.
-- Alexander Nasonov Project Manager at Akmosoft ( http://www.akmosoft.com ) Blog: http://nasonov.blogspot.com Email: $(FirstName) dot $(LastName) at gmail dot com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Thanks, Greg

On Sun, 27 Aug 2006 18:37:49 -0700, "Gregory Dai" <gregory.dai@gmail.com> wrote:
That's not true. std::basic_string<T> is not thread safe. I spent couple months early this year to deal with that fact and finally came up with an alternative string class with the same interface as std::string but does not use the copy-on-write technique.
Did you think it had a dynamically allocated buffer _on the stack_? Where do you hold the characters of *your* string class instances? C'mon. -- [ Gennaro Prota. C++ developer, Library designer. ] [ For Hire http://gennaro-prota.50webs.com/ ]

Perhaps we were not on the same page. What I meant was as long as std::string is used as either the source or target of a lexical_cast, thread safety may not be guaranteed (due to lack of a guarantee by std::string). Though weird/bad things happen more frequently under heavy load and/or on multiprocessor (hyperthreading included) machines. On 8/28/06, Gennaro Prota <gennaro_prota@yahoo.com> wrote:
On Sun, 27 Aug 2006 18:37:49 -0700, "Gregory Dai" <gregory.dai@gmail.com> wrote:
That's not true. std::basic_string<T> is not thread safe. I spent couple months early this year to deal with that fact and finally came up with an alternative string class with the same interface as std::string but does not use the copy-on-write technique.
Did you think it had a dynamically allocated buffer _on the stack_? Where do you hold the characters of *your* string class instances? C'mon.
-- [ Gennaro Prota. C++ developer, Library designer. ] [ For Hire http://gennaro-prota.50webs.com/ ]
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Thanks, Greg

Gregory Dai wrote:
Perhaps we were not on the same page. What I meant was as long as std::string is used as either the source or target of a lexical_cast, thread safety may not be guaranteed (due to lack of a guarantee by std::string). Though weird/bad things happen more frequently under heavy load and/or on multiprocessor (hyperthreading included) machines.
Right but what do you expect us to do about components like std::string that are beyond our control? In any case all the implementations I'm aware of are minimally thread safe these days, if that's not the case I'd certainly like to hear about it. BTW implementations that used copy-on-write in the early days almost all (all?) switched away from that precisely because of threading issues. John.

On 8/31/06, John Maddock <john@johnmaddock.co.uk> wrote:
In any case all the implementations I'm aware of are minimally thread safe these days, if that's not the case I'd certainly like to hear about it. BTW implementations that used copy-on-write in the early days almost all (all?) switched away from that precisely because of threading issues.
Even the latest libstdc++ (GNU's Standard C++ library) uses a refcounted std::string. One needs to be careful when sharing strings across thread boundaries. See these bug reports for a detailed discussion: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10350 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21334 -- Caleb Epstein

Caleb Epstein wrote:
On 8/31/06, John Maddock <john@johnmaddock.co.uk> wrote:
In any case all the implementations I'm aware of are minimally thread safe these days, if that's not the case I'd certainly like to hear about it. BTW implementations that used copy-on-write in the early days almost all (all?) switched away from that precisely because of threading issues.
Even the latest libstdc++ (GNU's Standard C++ library) uses a refcounted std::string.
Darn, I hadn't realised that :-(
One needs to be careful when sharing strings across thread boundaries. See these bug reports for a detailed discussion:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10350 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21334
Very interesting stuff! I'm not sure that it's relevent in this case though: if you're lexical_cast'ing *to* a string then these issues shouldn't arise, whether there are other threads getting involved or not. If you're casting *from* a string, then it appears that you need a mutex if you know the string is likely to be visible to other threads, even if all accesses to it are non-mutating. Have I got that right? Thanks, John.

On 8/31/06, John Maddock <john@johnmaddock.co.uk> wrote:
Caleb Epstein wrote:
On 8/31/06, John Maddock <john@johnmaddock.co.uk> wrote:
In any case all the implementations I'm aware of are minimally thread safe these days, if that's not the case I'd certainly like to hear about it. BTW implementations that used copy-on-write in the early days almost all (all?) switched away from that precisely because of threading issues.
Even the latest libstdc++ (GNU's Standard C++ library) uses a refcounted std::string.
Darn, I hadn't realised that :-(
One needs to be careful when sharing strings across thread boundaries. See these bug reports for a detailed discussion:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10350 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21334
Very interesting stuff!
I'm not sure that it's relevent in this case though: if you're lexical_cast'ing *to* a string then these issues shouldn't arise, whether there are other threads getting involved or not. If you're casting *from* a string, then it appears that you need a mutex if you know the string is likely to be visible to other threads, even if all accesses to it are non-mutating.
Have I got that right?
Well, not completely perhaps. E.g., at least if the target is a std::string of size=0, the resultant std::string may well be reference-counted and shared by multiple threads. WRT std::string, the solution appears to be using a string implementation that does not use reference counting, as you rightly pointed out. The GCC folks can resolve the bugs reported against the GCC library by replacing their std::string implementation. In most cases, as we can use std::string w/o worrying about threading issues, we can do the same with lexical_cast as well. BTW, I like lexical_cast a lot! Thanks to this community for making it available. Thanks, John.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Thanks, Greg

Gregory Dai wrote:
Have I got that right?
Well, not completely perhaps. E.g., at least if the target is a std::string of size=0, the resultant std::string may well be reference-counted and shared by multiple threads.
Can you explain some more? Are you saying that if two unrelated threads each default construct a string (or maybe construct a zero length string?) then they will be entangled by reference counting? Thanks, John.

On 9/1/06, John Maddock <john@johnmaddock.co.uk> wrote:
Gregory Dai wrote:
Have I got that right?
Well, not completely perhaps. E.g., at least if the target is a std::string of size=0, the resultant std::string may well be reference-counted and shared by multiple threads.
Can you explain some more? Are you saying that if two unrelated threads each default construct a string (or maybe construct a zero length string?) then they will be entangled by reference counting?
Well, I don't have definitive proof, but that's possible from my experience. E.g., we have many classes with ctors like this: ctor(..., std::string const& s = std::string()); More than a few developers had such experience that a crash could well be made disappear by changing the type of "s" to char const* (and, of course, passing whatever the string as string.c_str() to the ctor). We just had too many of it (and for a few other reasons) our solution was eventually replacing std::string with one that does not use the copy-on-write technique. Thanks, John.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Thanks, Greg

On Fri, 1 Sep 2006 15:26:11 -0700, "Gregory Dai" <gregory.dai@gmail.com> wrote:
Well, I don't have definitive proof, but that's possible from my experience.
E.g., we have many classes with ctors like this: ctor(..., std::string const& s = std::string()); More than a few developers had such experience that a crash could well be made disappear by changing the type of "s" to char const* (and, of course, passing whatever the string as string.c_str() to the ctor).
That experience has been given a specific name: "programming by coincidence". -- [ Gennaro Prota. C++ developer, Library designer. ] [ For Hire http://gennaro-prota.50webs.com/ ]

On 9/5/06, Gennaro Prota <gennaro_prota@yahoo.com> wrote:
On Fri, 1 Sep 2006 15:26:11 -0700, "Gregory Dai" <gregory.dai@gmail.com> wrote:
Well, I don't have definitive proof, but that's possible from my experience.
E.g., we have many classes with ctors like this: ctor(..., std::string const& s = std::string()); More than a few developers had such experience that a crash could well be made disappear by changing the type of "s" to char const* (and, of course, passing whatever the string as string.c_str() to the ctor).
That experience has been given a specific name: "programming by coincidence".
I admit I'm guilty of that, as well as some of my colleagues. But, as in life, one may not be smart and experienced enough nor have the time to get to the bottom of everything in programming. --
[ Gennaro Prota. C++ developer, Library designer. ] [ For Hire http://gennaro-prota.50webs.com/ ]
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Thanks, Greg

On 8/31/06, John Maddock <john@johnmaddock.co.uk> wrote:
Gregory Dai wrote:
Perhaps we were not on the same page. What I meant was as long as std::string is used as either the source or target of a lexical_cast, thread safety may not be guaranteed (due to lack of a guarantee by std::string). Though weird/bad things happen more frequently under heavy load and/or on multiprocessor (hyperthreading included) machines.
Right but what do you expect us to do about components like std::string that are beyond our control?
No, I certainly was not blaming the boost community for some shortcomings of std::string. In any case all the implementations I'm aware of are minimally thread safe
these days, if that's not the case I'd certainly like to hear about it. BTW implementations that used copy-on-write in the early days almost all (all?) switched away from that precisely because of threading issues.
John.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Thanks, Greg
participants (8)
-
Alexander Nasonov
-
Caleb Epstein
-
Gennaro Prota
-
Gregory Dai
-
Janek Kozicki
-
Jeff Garland
-
John Maddock
-
Peter Dimov