Re: [Boost-users] [Spirit-general] Pattern matching with boost
Thank you very much everybody!
I am currently building boost (as my current boost was under a
different toolset), but once it's built I'll be able to test.
Thank you all for your code samples + advice, especially Seth (sehe)
on stackoverflow :]
On Thu, Nov 10, 2011 at 1:15 PM, Seth Heeren
On 11/09/2011 10:31 PM, Alec Taylor wrote:
Feel free to add it to stackoverflow (I've been having some trouble posting there)
I can't give you what I've done so far, because I wasn't sure if it was within the capabilities of the boost::spiriti libraires to do what I'm trying to do. Alec, since you came to the Spirit list, it is only fair if I tried to show you a Spirit way. It uses Qi for parsing, and Karma for output generation. As you consented, I posted it on StackOverflow on your behalf:
http://stackoverflow.com/questions/8074103/how-would-i-perform-this-text-pat...
so by all means, have a look there (and don't forget to upvote it, if the answer was in any way helpful to your question. I think it (at least) comprehensively answers the main question ("is it withing the capabilities of the boost::spirit libraries"). Whether or not you would elect to do so, is up to you, of course!
Cheers, Seth
------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Spirit-general mailing list Spirit-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spirit-general
Would it be possible to abstract away the terms "apple" and "cheese"
to just equal anything with that pattern?
So like this:
s1=name1-> consecutive-number
s2= consecutive-number->name2
s3=name1->consecutive-number
s4=consecutive-number->name2
Thanks for all suggestions,
Alec Taylor
On Thu, Nov 10, 2011 at 2:35 PM, Alec Taylor
Thank you very much everybody!
I am currently building boost (as my current boost was under a different toolset), but once it's built I'll be able to test.
Thank you all for your code samples + advice, especially Seth (sehe) on stackoverflow :]
On Thu, Nov 10, 2011 at 1:15 PM, Seth Heeren
wrote: On 11/09/2011 10:31 PM, Alec Taylor wrote:
Feel free to add it to stackoverflow (I've been having some trouble posting there)
I can't give you what I've done so far, because I wasn't sure if it was within the capabilities of the boost::spiriti libraires to do what I'm trying to do. Alec, since you came to the Spirit list, it is only fair if I tried to show you a Spirit way. It uses Qi for parsing, and Karma for output generation. As you consented, I posted it on StackOverflow on your behalf:
http://stackoverflow.com/questions/8074103/how-would-i-perform-this-text-pat...
so by all means, have a look there (and don't forget to upvote it, if the answer was in any way helpful to your question. I think it (at least) comprehensively answers the main question ("is it withing the capabilities of the boost::spirit libraries"). Whether or not you would elect to do so, is up to you, of course!
Cheers, Seth
------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Spirit-general mailing list Spirit-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spirit-general
Alec Taylor
Would it be possible to abstract away the terms "apple" and "cheese" to just equal anything with that pattern?
So like this:
s1=name1-> consecutive-number s2= consecutive-number->name2 s3=name1->consecutive-number s4=consecutive-number->name2
Just what pattern do you mean, though? You've used "name" and "string1". Is there a character class or other short regexp that would match what you mean by "name" below? And do you already have the 4 strings, or are you scanning a continuous piece of text? If the latter, you're suddenly in the realm of wanting a state machine (because if s4 fails to match, then might have to consider s3 the new s1 and s4 the new s2...) Is "consecutive" only within the 4 strings, or is it actually cumulative from some other value? (This is a big part of what makes this difficult to do with boost or any other C/C++-based RE engine. Perl allows one to actually do evaluation within the search string, so you really could match this all with a single RE there -- but it'd be ugly, and I don't really know how efficient it would be.) Curious, T.
I have a larger code-base, in which I have restricted possible matches
to 4 std::string.
Basically all I know is there MIGHT exists a name1 with form
name1->number1, in s1 where name1 == name1 in s3 where s3 contains
name1->number1+2.
Perhaps look at a s5 (number1+4) to confirm pattern.
IF a pattern like this exists, return s1.substr(<whatever the mathed area is>)
(sorry if I didn't explain this clearer before)
How would I do this in boost?
Thanks for all suggestions,
Alec Taylor
On Thu, Nov 10, 2011 at 3:33 PM, Anthony Foiani
Alec Taylor
writes: Would it be possible to abstract away the terms "apple" and "cheese" to just equal anything with that pattern?
So like this:
s1=name1-> consecutive-number s2= consecutive-number->name2 s3=name1->consecutive-number s4=consecutive-number->name2
Just what pattern do you mean, though? You've used "name" and "string1". Is there a character class or other short regexp that would match what you mean by "name" below?
And do you already have the 4 strings, or are you scanning a continuous piece of text? If the latter, you're suddenly in the realm of wanting a state machine (because if s4 fails to match, then might have to consider s3 the new s1 and s4 the new s2...)
Is "consecutive" only within the 4 strings, or is it actually cumulative from some other value? (This is a big part of what makes this difficult to do with boost or any other C/C++-based RE engine. Perl allows one to actually do evaluation within the search string, so you really could match this all with a single RE there -- but it'd be ugly, and I don't really know how efficient it would be.)
Curious, T.
Here is a really simple explanation I just figured out to explain the problem I am trying to solve: std::string s1=garbagetext1+number1+name1+garbagetext4; std::string s3=garbagetext2+(number1+2)+name1+garbagetext5; std::string s5=garbagetext3+(number1+4)+name1+garbagetext6; If this pattern is found: return s1.substr(number1+name1); How can I do this using boost [or other] libraries? Thanks for all suggestions, Alec Taylor
On Nov 9, 2011, at 21:49, Alec Taylor wrote:
Here is a really simple explanation I just figured out to explain the problem I am trying to solve:
std::string s1=garbagetext1+number1+name1+garbagetext4; std::string s3=garbagetext2+(number1+2)+name1+garbagetext5; std::string s5=garbagetext3+(number1+4)+name1+garbagetext6;
If this pattern is found:
return s1.substr(number1+name1);
Ok, now you've changed the requirements again: every previous message talked about four consecutive messages.
How can I do this using boost [or other] libraries?
Have you even tried my code? http://article.gmane.org/gmane.comp.lib.boost.user/71262 It seems a bit rude to ask for suggestions and then [apparently] ignore them. If it doesn't work, do you understand it well enough that you could try to alter it? (Hint: look at str_pat, although see point [2] below) If you don't understand it, what don't you understand? I'll be happy to try to make it more obvious. I'll be less happy trying to help you further, when you don't seem able to provide a reasonable and consistent set of requirements. Some corner cases to think about: 1. Can you distinguish "garbage text" from a "name"? If so, how? Character set used? Spaces? Predefined set of names? 2. Can you get mulitple candidates in each string? E.g., s1 = "foo 1 apple 2 pear bar"; s2 = "baz 3 orange 3 pear quux"; 3. Is it 4 strings, or 3 strings (skipping intermediate strings), or...? 4. How large are these strings? If they're particularly large, then efficiency might have to trump elegance/readability. "particularly large" varies with time; for modern hardware, I wouldn't start worrying until the strings are ~MiB each, depending on how many sets of 3 (or 4 or 5 or 2 or...) we have to match. Finally, try "test driven development": A. Provide one or more sets of strings that are supposed to match, and what the correct output would be; B. Provide as many sets as you can think of that *shouldn't* match. Look for corner cases: empty strings, repetitions, off-by-one errors, "name" matching but number not in sequence, in sequence but names not matching, partial matches, upper/lower case, etc. At this point, I feel that you have a handful of suggestions, each of which is "correct" for some interpretation of the requirements you've given. But until you've gone through those solutions and determined what works (and what doesn't), it's not clear that we can help you much more. That is, this sounds like a case where you need to sit down and make your requirements a *lot* more stringent. It doesn't have much to do with which library you use. Best regards, Tony p.s. Apologies if this message is not formatted well -- this is not my normal/preferred mail interface.
participants (3)
-
Alec Taylor
-
Anthony Foiani
-
Tkil