
Phil Endecott skrev:
Joe Gottman wrote:
I just discovered that the string algorithm predicate is_any_of has an inefficient implementation. Internally it uses a std::set.
It would be nice if things like this could be at least as efficient as the equivalent C functionality. I've just done some quick benchmarks of possible is_any_of implementations. My test splits my /usr/share/dict/words at every lower-case vowel; it's about a megabyte and splits into 300,000 pieces.
First using std::set:
struct stdset_is_any_of { std::set<char> s; stdset_is_any_of(string pat) { for(string::const_iterator i=pat.begin(); i!=pat.end(); ++i) { s.insert(*i); } } bool operator()(char c) { return s.find(c) != s.end(); } };
Relative performance is 0.6.
Fon something as light as char, I suspect std::set<char> or tr1::unordered_set<char> is not an optimal choice. Could you try this test with boost::interprocess::flat_set<char> ? (It's in <boost/interprocess/containers/flat_set.hpp>) -Thorsten