
Stefan wrote:
* Optimize Boost String Algorithm finders to use an efficient substring matching algorithm as opposed to the naive algorithm. Most likely because of the fact that boost has implemented the finders for strings as generic algorithms, they haven't used efficient algorithms such as KMP, Boyer-Moore, Rabin-Karp etc. \see boost/algorithm/string/detail/finder.hpp The idea is to implement efficient algorithms such as the ones mentioned above to replace the current naive implementation. Note: it may be difficult to find nth matching substring position using some of these algorithms.
I don't have an application proving it's inefficient yet, but a O(nm) algorithm would almost always perform worse than a O(n+m) one.
So n = size of input and m = size of search pattern. If m is typically small, e.g. 2, then O(n+m) could easily beat O(nm) if the constant factors are in its favour. Even if m is large, then there will often be a dominant case where the potential match can be rejected after looking at just the first element (i.e. if I'm looking for xyz in abcdefgxyzlmnop, then a "worse case O(nm)" algorithm will actually take O(n)). I think that having a test data set in mind in advance would make this much more concrete to reason about. Does anyone have any suggestions?
I've just looked at the posted thread, but the algorithm is_any_of only looks for a character in a string, not for a substring in a string.
I guess my point was that even the simpler problem of finding a single character has lots of potential for interesting work. Regards, Phil.