patch - performance problem in boost string algorithms

Hello, I've been testing the performance of the Boost string algorithms library using the below program "boost_string_benchmark.cpp" as a simple benchmark. Here are the results I get using the CVS copy of Boost for two compilers and four methods of coding a left trim: MSVC++2005: 1: 4171 2: 3235 3: 3312 4: 4860 (run times) G++/Intel: 1: 1657 2: 1156 3: 1172 4: 2953 Now, I was getting order-of-magnitude lower run times with direct C coding of trim_left, so I made some minor changes to the Boost source in hope of improving this (below diff.txt). Here are the results after the changes: MSVC++2005: 1: 2406 2: 2359 3: 1500 4: 297 (run times) G++/Intel 1: 703 2: 703 3: 188 4: 172 It's a good improvement. In particular, the results using method #4 are much more like the order-of-magnitude improvement I saw with direct C coding. Therefore, such results are certainly achievable with Boost after a reasonable patch to Boost and coding in a more awkward way (method #4). It would be nice if I could obtain the same speedup using more typical coding methods (method #1). I bring this up for discussions since others here are much more familiar with the Boost internals than I am. Is is not reasonable to expect Boost string processing performance to be almost as fast as direct C coding? I only looked into this some, but it seemed that repeated temporary object construction was to blame for much of the slowdown, and the patch only partly addresses that. BTW, Boost is a really great project, and I appreciate the work you do. I have some other suggestions on Boost relating to flex_string support that I'll leave for a future discussion. --davidm <snip> // boost_string_benchmark.cpp // benchmarks for Boost string algorithms // David Manura, 20060719 // VC++2005: cl /EHsc /D_SCL_SECURE_NO_DEPRECATE /D_CRT_SECURE_NO_DEPRECATE /O2 /DNDEBUG /MT /D_MBCS boost_string_benchmark.cpp // GCC: g++ -O2 -DNDEBUG boost_string_benchmark.cpp #include <boost/algorithm/string.hpp> #include <ctime> #include <iostream> using namespace boost; using namespace std; const int num_loops = 3000000; struct cstring { typedef const char * const_iterator; typedef char * iterator; char * & b_; char * e_; iterator begin() { return b_; } iterator end() { return e_; } cstring(char * s) : b_(s), e_(s + strlen(s)) { } char * erase(char * p1, char * p2) { // for testing, we may comment this implementation out. //memmove(b_, p2, e_-p2+1); //e_ -= (p2 - p1); return p1; } }; template<class T1, class T2, class T3, class T4> void compare() { for (int n=0; n<3; n++) { clock_t t0 = clock(); T1::run(); clock_t t1 = clock(); T2::run(); clock_t t2 = clock(); T3::run(); clock_t t3 = clock(); T4::run(); clock_t t4 = clock(); cout << " 1: " << (t1 - t0) << " 2: " << (t2 - t1) << " 3: " << (t3 - t2) << " 4: " << (t4 - t3) << endl; } } template <class String> struct test1_string1 { static void run() { for (int n=0; n<num_loops; n++) { char c[] = " 12345678901234567890 "; String s(c); trim_left(s); } } }; template <class String> struct test1_string2 { static void run() { const std::locale loc; for (int n=0; n<num_loops; n++) { char c[] = " 12345678901234567890 "; String s(c); trim_left(s, loc); } } }; template <class String> struct test1_string3 { static void run() { const std::locale loc; const boost::algorithm::detail::is_classifiedF isspace2 = is_space(loc); for (int n=0; n<num_loops; n++) { char c[] = " 12345678901234567890 "; String s(c); trim_left_if(s, isspace2); } } }; template <class String> struct test1_string4 { static void run() { const boost::algorithm::detail::is_any_ofF<char> isspace2 = is_any_of(" "); for (int n=0; n<num_loops; n++) { char c[] = " 12345678901234567890 "; String s(c); trim_left_if(s, isspace2); } } }; template <class String> struct tests1 { static void run() { compare<test1_string1<String>, test1_string2<String>, test1_string3<String>, test1_string4<String> >(); } }; int main() { tests1<cstring>::run(); //tests1<std::string>::run(); return 0; } </snip> <snip> diff -udr boost/boost/algorithm/string/detail/trim.hpp boost-patched/boost/algorithm/string/detail/trim.hpp --- boost/boost/algorithm/string/detail/trim.hpp 2004-03-04 17:11:41.000000000 -0500 +++ boost-patched/boost/algorithm/string/detail/trim.hpp 2006-07-19 21:12:59.703125000 -0400 @@ -12,11 +12,14 @@ #include <boost/algorithm/string/config.hpp> #include <boost/detail/iterator.hpp> +#include <locale> namespace boost { namespace algorithm { namespace detail { + static const std::locale default_locale_; + // trim iterator helper -----------------------------------------------// // Search for first non matching character from the beginning of the sequence @@ -24,7 +27,7 @@ inline ForwardIteratorT trim_begin( ForwardIteratorT InBegin, ForwardIteratorT InEnd, - PredicateT IsSpace ) + const PredicateT & IsSpace ) { ForwardIteratorT It=InBegin; for(; It!=InEnd; ++It ) @@ -41,7 +44,7 @@ inline ForwardIteratorT trim_end( ForwardIteratorT InBegin, ForwardIteratorT InEnd, - PredicateT IsSpace ) + const PredicateT & IsSpace ) { typedef BOOST_STRING_TYPENAME boost::detail:: iterator_traits<ForwardIteratorT>::iterator_category category; @@ -53,7 +56,7 @@ inline ForwardIteratorT trim_end_iter_select( ForwardIteratorT InBegin, ForwardIteratorT InEnd, - PredicateT IsSpace, + const PredicateT & IsSpace, std::forward_iterator_tag ) { ForwardIteratorT TrimIt=InBegin; @@ -74,7 +77,7 @@ inline ForwardIteratorT trim_end_iter_select( ForwardIteratorT InBegin, ForwardIteratorT InEnd, - PredicateT IsSpace, + const PredicateT & IsSpace, std::bidirectional_iterator_tag ) { for( ForwardIteratorT It=InEnd; It!=InBegin; ) Only in boost-patched/boost/algorithm/string/detail: trim.hpp~ Only in boost-patched/boost/algorithm/string/detail: util.hpp~ diff -udr boost/boost/algorithm/string/trim.hpp boost-patched/boost/algorithm/string/trim.hpp --- boost/boost/algorithm/string/trim.hpp 2005-06-06 16:03:18.000000000 -0400 +++ boost-patched/boost/algorithm/string/trim.hpp 2006-07-19 21:41:07.515625000 -0400 @@ -58,7 +58,7 @@ inline OutputIteratorT trim_left_copy_if( OutputIteratorT Output, const RangeT& Input, - PredicateT IsSpace) + const PredicateT& IsSpace) { std::copy( ::boost::algorithm::detail::trim_begin( @@ -76,7 +76,7 @@ \overload */ template<typename SequenceT, typename PredicateT> - inline SequenceT trim_left_copy_if(const SequenceT& Input, PredicateT IsSpace) + inline SequenceT trim_left_copy_if(const SequenceT& Input, const PredicateT& IsSpace) { return SequenceT( ::boost::algorithm::detail::trim_begin( @@ -98,7 +98,7 @@ \note This function provides the strong exception-safety guarantee */ template<typename SequenceT> - inline SequenceT trim_left_copy(const SequenceT& Input, const std::locale& Loc=std::locale()) + inline SequenceT trim_left_copy(const SequenceT& Input, const std::locale& Loc=detail::default_locale_) { return trim_left_copy_if( @@ -116,7 +116,7 @@ \param IsSpace An unary predicate identifying spaces */ template<typename SequenceT, typename PredicateT> - inline void trim_left_if(SequenceT& Input, PredicateT IsSpace) + inline void trim_left_if(SequenceT& Input, const PredicateT& IsSpace) { Input.erase( begin(Input), @@ -135,7 +135,7 @@ \param Loc A locale used for 'space' classification */ template<typename SequenceT> - inline void trim_left(SequenceT& Input, const std::locale& Loc=std::locale()) + inline void trim_left(SequenceT& Input, const std::locale& Loc=detail::default_locale_) { trim_left_if( Input, @@ -164,7 +164,7 @@ inline OutputIteratorT trim_right_copy_if( OutputIteratorT Output, const RangeT& Input, - PredicateT IsSpace ) + const PredicateT& IsSpace ) { std::copy( begin(Input), @@ -182,7 +182,7 @@ \overload */ template<typename SequenceT, typename PredicateT> - inline SequenceT trim_right_copy_if(const SequenceT& Input, PredicateT IsSpace) + inline SequenceT trim_right_copy_if(const SequenceT& Input, const PredicateT& IsSpace) { return SequenceT( begin(Input), @@ -205,7 +205,7 @@ \note This function provides the strong exception-safety guarantee */ template<typename SequenceT> - inline SequenceT trim_right_copy(const SequenceT& Input, const std::locale& Loc=std::locale()) + inline SequenceT trim_right_copy(const SequenceT& Input, const std::locale& Loc=detail::default_locale_) { return trim_right_copy_if( @@ -224,7 +224,7 @@ \param IsSpace An unary predicate identifying spaces */ template<typename SequenceT, typename PredicateT> - inline void trim_right_if(SequenceT& Input, PredicateT IsSpace) + inline void trim_right_if(SequenceT& Input, const PredicateT& IsSpace) { Input.erase( ::boost::algorithm::detail::trim_end( @@ -245,7 +245,7 @@ \param Loc A locale used for 'space' classification */ template<typename SequenceT> - inline void trim_right(SequenceT& Input, const std::locale& Loc=std::locale()) + inline void trim_right(SequenceT& Input, const std::locale& Loc=detail::default_locale_) { trim_right_if( Input, @@ -274,7 +274,7 @@ inline OutputIteratorT trim_copy_if( OutputIteratorT Output, const RangeT& Input, - PredicateT IsSpace) + const PredicateT& IsSpace) { BOOST_STRING_TYPENAME range_const_iterator<RangeT>::type TrimEnd= @@ -298,7 +298,7 @@ \overload */ template<typename SequenceT, typename PredicateT> - inline SequenceT trim_copy_if(const SequenceT& Input, PredicateT IsSpace) + inline SequenceT trim_copy_if(const SequenceT& Input, const PredicateT& IsSpace) { BOOST_STRING_TYPENAME range_const_iterator<SequenceT>::type TrimEnd= @@ -328,7 +328,7 @@ \note This function provides the strong exception-safety guarantee */ template<typename SequenceT> - inline SequenceT trim_copy( const SequenceT& Input, const std::locale& Loc=std::locale() ) + inline SequenceT trim_copy( const SequenceT& Input, const std::locale& Loc=detail::default_locale_) { return trim_copy_if( @@ -346,7 +346,7 @@ \param IsSpace An unary predicate identifying spaces */ template<typename SequenceT, typename PredicateT> - inline void trim_if(SequenceT& Input, PredicateT IsSpace) + inline void trim_if(SequenceT& Input, const PredicateT& IsSpace) { trim_right_if( Input, IsSpace ); trim_left_if( Input, IsSpace ); @@ -361,7 +361,7 @@ \param Loc A locale used for 'space' classification */ template<typename SequenceT> - inline void trim(SequenceT& Input, const std::locale& Loc=std::locale()) + inline void trim(SequenceT& Input, const std::locale& Loc=detail::default_locale_) { trim_if( Input, </snip>

David Manura said: (by the date of Thu, 20 Jul 2006 00:08:32 -0400)
Now, I was getting order-of-magnitude lower run times with direct C coding of trim_left, so I made some minor changes to the Boost source in hope of improving this (below diff.txt).
<snip>
- PredicateT IsSpace ) + const PredicateT & IsSpace )
wow. It's a good programming practice to use const& everywhere where possible. T'd love to see this patch applied. Thanks! -- Janek Kozicki |

Janek Kozicki <janek_listy <at> wp.pl> writes:
wow. It's a good programming practice to use const& everywhere where possible. T'd love to see this patch applied. Thanks!
The below second patch provides much further improvement. I realize the second patch is somewhat of a hack, but it illustrates where the rest of the slowdown is coming from and may suggest a better way of fixing it. Before second patch: MSVC++2005: 1: 2406 2: 2359 3: 1500 4: 297 (run times) G++/Intel: 1: 703 2: 703 3: 188 4: 172 After second patch: MSVC++2005: 1: 235 2: 250 3: 251 4: 282 (run times) G++/Intel: 1: 141 2: 141 3: 109 4: 172 There's likely other places in the string algorithms code that could benefit from similar changes. --davidm <snip> diff -udr boost/boost/algorithm/string/detail/classification.hpp boost-patched/boost/algorithm/string/detail/classification.hpp --- boost/boost/algorithm/string/detail/classification.hpp 2006-04-16 05:46:34.000000000 -0400 +++ boost-patched/boost/algorithm/string/detail/classification.hpp 2006-07-20 08:58:02.359849300 -0400 @@ -21,6 +21,7 @@ #include <boost/algorithm/string/predicate_facade.hpp> #include <boost/type_traits/remove_const.hpp> +#include <locale> namespace boost { namespace algorithm { @@ -36,14 +37,15 @@ template <class Args> struct sig { typedef bool type; }; // Constructor from a locale - is_classifiedF(std::ctype_base::mask Type, std::locale const & Loc = std::locale()) : + is_classifiedF(std::ctype_base::mask Type, std::locale const & Loc = default_locale_) : m_Type(Type), m_Locale(Loc) {} // Operation template<typename CharT> bool operator()( CharT Ch ) const { - return std::use_facet< std::ctype<CharT> >(m_Locale).is( m_Type, Ch ); + static const std::ctype<CharT> & data = std::use_facet< std::ctype<CharT> >(m_Locale); + return data.is( m_Type, Ch ); } #if defined(__BORLANDC__) && (__BORLANDC__ >= 0x560) && (__BORLANDC__ <= 0x582) && !defined(_USE_OLD_RW_STL) @@ -56,7 +58,7 @@ private: const std::ctype_base::mask m_Type; - const std::locale m_Locale; + const std::locale & m_Locale; }; // is_any_of functor </snip>

David Manura wrote:
// Operation template<typename CharT> bool operator()( CharT Ch ) const { - return std::use_facet< std::ctype<CharT> >(m_Locale).is( m_Type, Ch ); + static const std::ctype<CharT> & data = std::use_facet< std::ctype<CharT> >(m_Locale); + return data.is( m_Type, Ch ); }
That is most definitely a bug waiting to pounce. m_Locale is specific to each instance. The static variable inside the member function is the same for all instances. The bug hits when somebody uses this functor with two different locales. The first object to invoke operator() will work fine - the second (and each subsequent) will mysteriously use the locale of the first. If you want to cache the ctype facet, do it as a class variable, perhaps in the constructor. Sebastian Redl

Hi, First of all, I'd like to thank you for the nice benchmarks. Now my comments to your patches: 1.) const Predicate& Pred Although this change can bring a performance benefits, it has one big drawback. You cannot pass an ordinary C function as predicate. At least not in VC++ 7.1 Following code will not work: int func(int a) { return a+2; } template<typename TFunc> int sum(TFunc f, unsigned int a) { int acum=0; for(unsigned int i=0; i<a; i++) { acum+=f(i); } return acum; } ... sum(func, 1000); I consider this is to be very important functionality, therefor it is implemented the way it is. Your investigation however pushed me to investigate this matter once more.It seems, that gcc have no problem compiling it even in ansi mode. I'm not sure what is the correct Standard explanation. So I'll think about it and try to come up with some improvements here. 2.) std::locale() I'm not a big fan of global objects when they are not necessary. Correct locale implementation should give only a very small footprint when instantiating std::locale(), since is all should be reference based. As seen from your benchmarks, there is great difference between g++ and msvc. On the other hand, the difference on the vc++ is so large, that it is worth considering. 3.) replacing member with reference in the classification predicated. This patch is definitely not acceptable due to reasons that were partialy explained by Sebastian. Classisfication predicates could be instantiated elsewhere and the lifetime of the predicate is in no way bounded to the lifeting of the locales object. So there is very good chance for "dangling reference" problems. So that's it for now from my side. Once again thanks for the good work on the benchmarks. I will definitely try to use the information you have provided to improve the library. Best regards, Pavol David Manura wrote:
Janek Kozicki <janek_listy <at> wp.pl> writes:
wow. It's a good programming practice to use const& everywhere where possible. T'd love to see this patch applied. Thanks!
The below second patch provides much further improvement. I realize the second patch is somewhat of a hack, but it illustrates where the rest of the slowdown is coming from and may suggest a better way of fixing it.
Before second patch:
MSVC++2005: 1: 2406 2: 2359 3: 1500 4: 297 (run times) G++/Intel: 1: 703 2: 703 3: 188 4: 172
After second patch:
MSVC++2005: 1: 235 2: 250 3: 251 4: 282 (run times) G++/Intel: 1: 141 2: 141 3: 109 4: 172
There's likely other places in the string algorithms code that could benefit from similar changes.
--davidm

Pavol Droba wrote:
Hi,
First of all, I'd like to thank you for the nice benchmarks.
Now my comments to your patches:
1.) const Predicate& Pred Although this change can bring a performance benefits, it has one big drawback. You cannot pass an ordinary C function as predicate. At least not in VC++ 7.1
Following code will not work:
int func(int a) { return a+2; }
template<typename TFunc> int sum(TFunc f, unsigned int a) { int acum=0; for(unsigned int i=0; i<a; i++) { acum+=f(i); }
return acum; }
...
sum(func, 1000);
It is surprising for me. VC++ seems to need overloads: int func(int a) { return a+2; } struct func_t { int operator()(int a) const { return a+2; } }; template<typename TFunc> int sum_aux(TFunc& f, unsigned int a) { int acum=0; for(unsigned int i=0; i<a; i++) { acum+=f(i); } return acum; } template<typename TFunc> int sum(TFunc const& f, unsigned int a) { return ::sum_aux(f, a); } template<typename TFunc> int sum(TFunc& f, unsigned int a) { return ::sum_aux(f, a); } void test() { ::sum(::func, 1000); ::sum(&::func, 1000); ::sum(::func_t(), 1000); } Looks working well under gcc and vc. But, the forwarding problem will come... -- Shunsuke Sogame

Pavol Droba wrote:
Hi,
First of all, I'd like to thank you for the nice benchmarks.
Now my comments to your patches:
1.) const Predicate& Pred Although this change can bring a performance benefits, it has one big drawback. You cannot pass an ordinary C function as predicate. At least not in VC++ 7.1
I consider this is to be very important functionality, therefor it is implemented the way it is.
VC++ is correct here: in order to bind to that signature, one would have to create a "reference to const-function" type, but const-qualified functions are illegal types. However, I believe there may be a DR that *would* make this legal: gcc tends to be at the forefront of implementing these even if they haven't made it into an official std yet.
2.) std::locale()
I'm not a big fan of global objects when they are not necessary. Correct locale implementation should give only a very small footprint when instantiating std::locale(), since is all should be reference based. As seen from your benchmarks, there is great difference between g++ and msvc. On the other hand, the difference on the vc++ is so large, that it is worth considering.
In theory passing a locale by value should be cheap, and it's distressing to see that it's not. Note that caching a global locale object and doing something like: foo(const std::locale& l = get_cached_locale_object()); is *not* an option: the global locale can be changed at any time by calling std::locale::global, which would make the locales returned by get_cached_locale_object() and std::locale() non-equivalent. Sorry for the bad news :-( John.

Pavol Droba wrote:
Now my comments to your patches:
1.) const Predicate& Pred Although this change can bring a performance benefits, it has one big drawback. You cannot pass an ordinary C function as predicate. At least not in VC++ 7.1
Would it be possible to use the argument_traits library to determine the most efficient way of passing the argument? Or would that destroy the deduction of template arguments? Sebastian

Sebastian Redl wrote:
Pavol Droba wrote:
Now my comments to your patches:
1.) const Predicate& Pred Although this change can bring a performance benefits, it has one big drawback. You cannot pass an ordinary C function as predicate. At least not in VC++ 7.1
Would it be possible to use the argument_traits library to determine the most efficient way of passing the argument? Or would that destroy the deduction of template arguments?
Do you mean call_traits? As far as I know, they don't have special support for function pointer. In addition, they cannot be directly used on function templates. Any such a facility needs to know the type of the argument beforehand, so it can be transformed into result type. This is not a case in function templates. The only generic way I'm aware of is the usage of enable_if. But this would make the interface quite messy. Especialy when there is quite a lot of functions which are affected. Anyway, I think, I know a reasonable compromise. I will leave function definitions as they are now, but internaly I will call them with boost::ref. Example: Now we have template<typename SequenceT, typename PredicateT> void trim_if(SequenceT Seq, PredicateT Pred); and template<typename SequenceT> void trim(SequenceT Seq) { trim_if(Seq, is_space()); } I can change the second one to something like this: trim_if(Seq, boost::ref(is_space())); So it will go without copy penalty. Is this reasonable enough? Regards, Pavol
participants (6)
-
David Manura
-
Janek Kozicki
-
John Maddock
-
Pavol Droba
-
Sebastian Redl
-
Shunsuke Sogame