Yes i think there were some misunderstanding here.. I think that comes by the definition you have of mistake. A mistake for me is as follows: Regex: "testing" String_to_search: "tastung". The output should be that the regex testing was found but with 2 mismatches that are "a" and "u". So a mismatch is a letter that was not found. It may sound weird to you but the way i'm using the regex is to identify genomic regions, so in other words for biological applications. In some cases my regex is a piece of DNA such as "atgcta" and i want to search for this regex in another piece of DNA. Given the fact that the regex "atgcta" can be found in the genome many times i will get probably get a lot of matches. But in some cases i want to be able to search for "atgcta" but i want to allow some mismatches. Obviuously i will even get more matches but i think regex can be a more much efficient way that by building ip aligment matrices. ANy idea how to handle the example above
-----Original Message----- From: boost-users-bounces_at_[hidden] [mailto:boost-users- bounces_at_[hidden]] On Behalf Of david v Sent: Tuesday, August 29, 2006 9:08 AM To: boost-users_at_[hidden] Subject: [Boost-users] Mismatch and regex newbie problem still problem
So to sum-up. If the regex i'm looking for is "testing" and the string to search the regex for is "tastung" (obviously this is a short example but i'm dealing with more complex regular expressions.
how can i get the number of mismatches. Basically the output of the program would tell me:
2 mismatches found in string "tastung" at position 2 (a) and 5(u).
[Nat] Maybe I'm completely misunderstanding you. If so, please forgive me. I think you're saying that you want to start with the regex "testing" and have the library detect that the string "tastung" is somehow similar but nonetheless distinct. My belief is that, given the regex "testing", the library will not recognize the string "tastung" in any way. It will simply report that no match was found. You could construct a more complex regex that would handle this particular example. You could, for instance, say that you want to match a "t", followed by an arbitrary character, followed by "st", followed by another arbitrary character, followed by "ng". The library would report that the string "tastung" matches that regex, and you could ask it to tell you the specific substrings matching the variable parts. But if you want to allow arbitrary variance in any character position -- as long as some other set of character positions matches -- then I'm a little perplexed as to how to express that in a regular expression. Maybe an exhaustive family of acceptable alternatives? But if you're dealing with longer expression strings, that could explode really quickly. I think you need to get really specific about the rules you want to use to detect a "mismatch." Then you need to figure out whether the regex library is the right tool to help you apply those rules. Again, sorry if I'm way off base here.