Re: [boost] [String algorithm] is_any_of has inefficient implementation

15 Feb 2008

      Phil Endecott skrev:
...
Joe Gottman wrote:
...
I just discovered that the string algorithm predicate is_any_of has an 
inefficient implementation. Internally it uses a std::set.
It would be nice if things like this could be at least as efficient as 
the equivalent C functionality.  I've just done some quick benchmarks 
of possible is_any_of implementations.  My test splits my 
/usr/share/dict/words at every lower-case vowel; it's about a megabyte 
and splits into 300,000 pieces.
First using std::set:
struct stdset_is_any_of {
   std::set<char> s;
   stdset_is_any_of(string pat) {
     for(string::const_iterator i=pat.begin(); i!=pat.end(); ++i) {
       s.insert(*i);
     }
   }
   bool operator()(char c) {
     return s.find(c) != s.end();
   }
};
Relative performance is 0.6.
Fon something as light as char, I suspect std::set<char> or 
tr1::unordered_set<char> is not an optimal choice.

Could you try this test with boost::interprocess::flat_set<char> ?
(It's in <boost/interprocess/containers/flat_set.hpp>)

-Thorsten

Re: [boost] [String algorithm] is_any_of has inefficient implementation

Thorsten Ottosen