
Thomas Klimpel wrote:
Phil Endecott wrote:
I think that having a test data set in mind in advance would make this much more concrete to reason about. Does anyone have any suggestions?
Some people regard using an algorithm with a really bad complexity as some sort of bug, no?
Yes, indeed!
So the question would be how this "bug" can be exposed, in case it is really a bug. One suggestion in this direction might be a denial of service attack. But in my own experience, such "bugs" tend to cause damage in more subtle and unexpected ways. Perhaps somebody can answer me the opposite question: What is the advantage of having subtle bugs in your algorithms that only wait to bite you when you expect it the least?
Well the not-100%-serious answer would be "because they might be faster the rest of the time". I think these algorithms would be useful contributions to Boost. But because their complexity benefits may not be seen for common usage patterns (e.g. short search patterns), I think that studying their actual performance with some realistic test data (and not synthetic pseudo-random patterns!) should be a part of the project. I also wonder about other variations, e.g. avoiding recomputing the table in the KMP or Boyer-Moore algorithms for repeated searches with the same pattern: string text = content_of_file("long_text.txt"); kmp_pattern pat("search for this"); // compute the table here. string::const_iterator i = text.begin(); while (i != text.end()) { i = kmp_search(i,text.end(),pat); ... } If N = length of text, M = number of matches and P = length of pattern, I think this reduces the complexity from O(N+MP) to O(N+P). Do this with a TMP and a compile-time pattern, and you've got something expressive-like. But in my experience, when I have had to do fast search it has always been repeated search for different patterns and so I've used some sort of an index. Stephan mentioned suffix arrays and I think that would be an even more valuable contribution. Regards, Phil.