Mismatch and regex newbie problem still problem"
Hello, As James pointed out the right place to start with will be suffix tree algorithms. The only thing i don't know if this is going to be very time consuming ??? ######For Brian.... 1) What's the longest string you're searching for? For the moment is limited to 6 to 8 characters. 2) What's the smallest bit-width of any cpu on which you're running your code? I'm on 64 dual opteron machine 3) How familiar with string searching/matching are you? Quite familiar but was looking for a cpp library that could handle it instead of trying to rewrite things. Best, david
david v wrote:
Hello, As James pointed out the right place to start with will be suffix tree algorithms. The only thing i don't know if this is going to be very time consuming ???
######For Brian.... 1) What's the longest string you're searching for? For the moment is limited to 6 to 8 characters.
2) What's the smallest bit-width of any cpu on which you're running your code? I'm on 64 dual opteron machine
3) How familiar with string searching/matching are you? Quite familiar but was looking for a cpp library that could handle it instead of trying to rewrite things.
Best, david
David, In my experience, the bit-parallel algorithm of Navarro/Hyyro is significantly faster to build and smaller in memory than the suffix trees. As a third bonus, it's a faster search technique. Two papers: 1) "A Bit-Vector Algorithm for Computing Levenshtein and Damerau Edit Distances" by Heikki Hyyro of the Dept of Computer and Information Sciences, University of Tampere Finland. 2) "Faster Bit-parallel Approximate String Matching" by Hyyro and Gonzalo Navarro. I think that paper is from the Proc. 13th Combinatorial Pattern Matching Both should show up on a google search if you use the exact title. Since you've a 64-bit machine, you could handle up to 63 characters of a search pattern at a max, with any length text to search. Apologies that I can't share my implementations with you on this, they're not for searching but for string matching. I seem to remember that the papers are oriented towards your type of searching across a large text stream. regards, and congrats in advance, Brian
If you're willing to dig into assembly and SSE instructions for your
processor, its possible you could do an exhaustive search in O(N) time (N
being the lenght of the string your searching in). I don't know too much
about the whole thing, but I do know that some processors have a vector math
function that allows you to compare up to something like 16 single byte
values in one cycle. Pretty nifty, but like I say, I haven't any idea on
how to actually code it.
Paul
On 8/29/06, david v
Hello, As James pointed out the right place to start with will be suffix tree algorithms. The only thing i don't know if this is going to be very time consuming ???
######For Brian.... 1) What's the longest string you're searching for? For the moment is limited to 6 to 8 characters.
2) What's the smallest bit-width of any cpu on which you're running your code? I'm on 64 dual opteron machine
3) How familiar with string searching/matching are you? Quite familiar but was looking for a cpp library that could handle it instead of trying to rewrite things.
Best, david
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (3)
-
Brian Allison
-
david v
-
Paul Davis