
Oh, and something important which I forgot to mention: Bad-character shifting requires the presence of random-access iterators in order to work efficiently. Consequently, since Boyer-Moore and Boyer-Moore-Horspool are using bad-character shifting, they are only efficient for random-access iterators. Hence KMP will be still useful, even if it is much slower in the most common cases. And a question for Marshall: I have noticed that inside the search.hpp file you have a requirement for random-access iterators regarding the Knuth-Pratt-Morris searching algorithm as well. Is this for both the pattern and the corpus? Phil Endecott wrote:
How efficiently could the current algorithms be used with DNA data? Does anyone have any experience with a packed base-4 representation?
May I propose an easy way: Why not creating random sequences, using only four char values (ie "abcd") This will easily give you a very good idea about the KMP performance. For a more elaborate aproach, you can always use the tests of Musser & Nishanov: http://www.cs.rpi.edu/~musser/gp/gensearch/index.html
Phil Endecott wrote:
However, in my opinion the most interesting and useful property of the KMP algorithm is not its performance, but its ability to help in building very generic sequence searching algorithms. In fact KMP has helped me to build an efficient generic sequence searching algorithm, which actually works with ranges accessed by single-pass (input) iterators. (See http://www.codeproject.com/KB/stl/single_pass_search.aspx and also in boost-vault/Algorithms) Needless to say that I would be more than happy to submit this work to boost, whenever there is an interest.
I am reminded that Spirit has some sort of adaptor that allows an input iterator to be used with a grammar that requires backtracking by storing recent inputs in a buffer. Presumably something like this could be used here. How much does your algorithm gain over a combination of such a buffer and one of Marshall's algorithms?
Very good idea! I will try to compare my algorithm against this adaptor. A very interesting benchmark for me thank you Phil. Best regards Jim Xochellis Homepage: https://sites.google.com/site/xochellis/