
Apologies for being way behind with your messages - I haven't had a chance to examine your code yet.
Two main slowdown factors are (1) excessive tag copying and (2) nondeterminism. I can name two ways to reduce (1). The factor (2) is all about ANFA to ADFA conversion.
Copying, is always a real killer, unless you can elliminate memory allocations. More than anything else these will soak up cycles, especially on multi-processor systems where the locking can really kill performance (imagine multiple threads all calling new/delete constantly).
These optimizations are subject to additional research.
And, of course, there are additional speed reserves in KMP for search problems.
Actually, I think I may remove that code as a dis-optimisation, a simple scan for first character is probably faster in the *average* case, and of course if you can do a Boyer-search that's even better. Like you, more research is needed! John.