[xpression] fuzzy on smatch fields
<alert comment="xpression newbie"> Xpressive looks very promising to be able to do some things I'm trying to implement. Thanks for providing and supporting it. I was trying to figure out the fields that make up xpression smatch'es, and expanded the Example-1 to be more verbose. Basically, I tried to "unpack" as much info as I could find in the variable "what" to clarify some fuzziness on my part. There were some questions: * The suffix and prefix info seemed blank. Are there accessors to get more info to conform to my (possibly flawed) understanding of the docs? * The return from regex_id seemed to be an address (like 00323F58). Is that intended? Is there some accessor to get something more meaningful? (but I'm not clear what would be meaningful). What is the purpose of "regex_id" to the user of xpressive? * With vc7.1, there were warnings unless I cast the lengths and positions ... is this intended? void example1_verbose() { std::string hello( "hello world!" ); sregex rex = sregex::compile( "(\\w+) (\\w+)!" ); smatch what; if( regex_match( hello, what, rex ) ) { std::cout << "Overall Size: " << static_cast<int>(what.size()) << '\n'; std::cout << "Regex Id: " << what.regex_id() << '\n'; std::cout << "what[0]: " << what[0] << '\n'; std::cout << "Length(0): " << static_cast<int>(what.length(0)) << '\n'; std::cout << "Position(0): " << static_cast<int>(what.position(0)) << '\n'; std::cout << "Str(0): " << what.str(0) << '\n'; std::cout << "what[1]: " << what[1] << '\n'; std::cout << "Length(1): " << static_cast<int>(what.length(1)) << '\n'; std::cout << "Position(1): " << static_cast<int>(what.position(1)) << '\n'; std::cout << "Str(1): " << what.str(1) << '\n'; std::cout << "what[2]: " << what[2] << '\n'; std::cout << "Length(2): " << static_cast<int>(what.length(2)) << '\n'; std::cout << "Position(2): " << static_cast<int>(what.position(2)) << '\n'; std::cout << "Str(2): " << what.str(2) << '\n'; std::cout << "Prefix(): " << what.prefix() << '\n'; std::cout << "Prefix().matched: " << what.prefix().matched << '\n'; std::cout << "Prefix().length(): " << static_cast<int>(what.prefix().length()) << '\n'; std::cout << "Prefix().str(): " << what.prefix().str() << '\n'; std::cout << "Suffix(): " << what.suffix() << '\n'; std::cout << "Suffix().matched: " << what.suffix().matched << '\n'; std::cout << "Suffix().length(): " << static_cast<int>(what.suffix().length()) << '\n'; std::cout << "Suffix().str(): " << what.suffix().str() << '\n'; } } Example 1: Verbose: Overall Size: 3 Regex Id: 00323F58 what[0]: hello world! Length(0): 12 Position(0): 0 Str(0): hello world! what[1]: hello Length(1): 5 Position(1): 0 Str(1): hello what[2]: world Length(2): 5 Position(2): 6 Str(2): world Prefix(): Prefix().matched: 0 Prefix().length(): 0 Prefix().str(): Suffix(): Suffix().matched: 0 Suffix().length(): 0 Suffix().str(): </alert>
Oops ... disregard .... redface .... Example 1 is related to "match" rather than "search" .... so I suppose prefix and suffix would not apply. But, in example 2, seems like "year" should be repeat<1,4>, but: year= repeat<1,2> works: cregex date = (month= repeat<1,2>(_d)) // find the month ...
(delim= (set= '/','-')) // followed by a delimiter ... (day= repeat<1,2>(_d)) >> delim // and a day followed by the same delimiter ... (year= repeat<1,2>(_d >> _d)); // and the year.
actually, repeat<1,3> works for month, day, and year. Am I mixed up on what "repeat" means? cregex date = (month= repeat<1,3>(_d)) // find the month ...
(delim= (set= '/','-')) // followed by a delimiter ... (day= repeat<1,3>(_d)) >> delim // and a day followed by the same delimiter ... (year= repeat<1,3>(_d >> _d)); // and the year.
Lynn Allan wrote:
<alert comment="xpression newbie">
Xpressive looks very promising to be able to do some things I'm trying to implement. Thanks for providing and supporting it.
I was trying to figure out the fields that make up xpression smatch'es, and expanded the Example-1 to be more verbose. Basically, I tried to "unpack" as much info as I could find in the variable "what" to clarify some fuzziness on my part. There were some questions:
* The suffix and prefix info seemed blank. Are there accessors to get more info to conform to my (possibly flawed) understanding of the docs?
* The return from regex_id seemed to be an address (like 00323F58). Is that intended? Is there some accessor to get something more meaningful? (but I'm not clear what would be meaningful). What is the purpose of "regex_id" to the user of xpressive?
* With vc7.1, there were warnings unless I cast the lengths and positions ... is this intended?
void example1_verbose() { std::string hello( "hello world!" ); sregex rex = sregex::compile( "(\\w+) (\\w+)!" ); smatch what; if( regex_match( hello, what, rex ) ) { std::cout << "Overall Size: " << static_cast<int>(what.size()) << '\n'; std::cout << "Regex Id: " << what.regex_id() << '\n'; std::cout << "what[0]: " << what[0] << '\n'; std::cout << "Length(0): " << static_cast<int>(what.length(0)) << '\n'; std::cout << "Position(0): " << static_cast<int>(what.position(0)) << '\n'; std::cout << "Str(0): " << what.str(0) << '\n'; std::cout << "what[1]: " << what[1] << '\n'; std::cout << "Length(1): " << static_cast<int>(what.length(1)) << '\n'; std::cout << "Position(1): " << static_cast<int>(what.position(1)) << '\n'; std::cout << "Str(1): " << what.str(1) << '\n'; std::cout << "what[2]: " << what[2] << '\n'; std::cout << "Length(2): " << static_cast<int>(what.length(2)) << '\n'; std::cout << "Position(2): " << static_cast<int>(what.position(2)) << '\n'; std::cout << "Str(2): " << what.str(2) << '\n'; std::cout << "Prefix(): " << what.prefix() << '\n'; std::cout << "Prefix().matched: " << what.prefix().matched << '\n'; std::cout << "Prefix().length(): " << static_cast<int>(what.prefix().length()) << '\n'; std::cout << "Prefix().str(): " << what.prefix().str() << '\n'; std::cout << "Suffix(): " << what.suffix() << '\n'; std::cout << "Suffix().matched: " << what.suffix().matched << '\n'; std::cout << "Suffix().length(): " << static_cast<int>(what.suffix().length()) << '\n'; std::cout << "Suffix().str(): " << what.suffix().str() << '\n'; } }
Example 1: Verbose:
Overall Size: 3 Regex Id: 00323F58 what[0]: hello world! Length(0): 12 Position(0): 0 Str(0): hello world! what[1]: hello Length(1): 5 Position(1): 0 Str(1): hello what[2]: world Length(2): 5 Position(2): 6 Str(2): world Prefix(): Prefix().matched: 0 Prefix().length(): 0 Prefix().str(): Suffix(): Suffix().matched: 0 Suffix().length(): 0 Suffix().str():
</alert>
Lynn Allan wrote:
But, in example 2, seems like "year" should be repeat<1,4>, but: year= repeat<1,2> works:
cregex date = (month= repeat<1,2>(_d)) // find the month ...
(delim= (set= '/','-')) // followed by a delimiter ... (day= repeat<1,2>(_d)) >> delim // and a day followed by the same delimiter ... (year= repeat<1,2>(_d >> _d)); // and the year.
actually, repeat<1,3> works for month, day, and year. Am I mixed up on what "repeat" means?
repeat
Eric Niebler wrote:
Lynn Allan wrote:
But, in example 2, seems like "year" should be repeat<1,4>, but: year= repeat<1,2> works:
cregex date = (month= repeat<1,2>(_d)) // find the month ...
(delim= (set= '/','-')) // followed by a delimiter ... (day= repeat<1,2>(_d)) >> delim // and a day followed by the same delimiter ... (year= repeat<1,2>(_d >> _d)); // and the year.
actually, repeat<1,3> works for month, day, and year. Am I mixed up on what "repeat" means?
repeat
(X) means to match X between n and m times, inclusive. So matching a month a day, you want repeat<1,2>(_d) to match 1 or 2 digit characters, and to match a year, you want repeat<1,2>(_d >> _d) to match two digits or four digits. Three digits isn't a common representation of a year.
Ok .... and thanks for your patient assistance. I think I see why repeat<1,2> works for yyyy, but AFAICT, the repeat<1,3> "worked" for day dd and month mm, which seems off. I changed the sample code "just to see what would happen" and was scratch-my-head-surprised to get the same results from repeat<1,2> as for repeat<1,3> .... days and months were "captured". But I'm probably doing something wrong or "just don't get it" about xpressive. Here is the "tweaked" Example 2 using repeat<1,3>: void example2() { char const *str = "I was born on 5/30/1973 at 7am."; // define some custom mark_tags with names more meaningful than s1, s2, etc. mark_tag day(1), month(2), year(3), delim(4); // this regex finds a date cregex date = (month= repeat<1,3>(_d)) // find the month ... >> (delim= (set= '/','-')) // followed by a delimiter ... >> (day= repeat<1,3>(_d)) >> delim // and a day followed by the same delimiter ... >> (year= repeat<1,3>(_d >> _d)); // and the year. cmatch what; if( regex_search( str, what, date ) ) { std::cout << "LdaExample2" << '\n'; // whole match std::cout << what[0] << '\n'; // whole match std::cout << what[day] << '\n'; // the day std::cout << what[month] << '\n'; // the month std::cout << what[year] << '\n'; // the year std::cout << what[delim] << '\n'; // the delimiter } }
Lynn Allan wrote:
Eric Niebler wrote:
Lynn Allan wrote:
But, in example 2, seems like "year" should be repeat<1,4>, but: year= repeat<1,2> works:
cregex date = (month= repeat<1,2>(_d)) // find the month ...
(delim= (set= '/','-')) // followed by a delimiter ... (day= repeat<1,2>(_d)) >> delim // and a day followed by the same delimiter ... (year= repeat<1,2>(_d >> _d)); // and the year.
actually, repeat<1,3> works for month, day, and year. Am I mixed up on what "repeat" means?
repeat
(X) means to match X between n and m times, inclusive. So matching a month a day, you want repeat<1,2>(_d) to match 1 or 2 digit characters, and to match a year, you want repeat<1,2>(_d >> _d) to match two digits or four digits. Three digits isn't a common representation of a year. Ok .... and thanks for your patient assistance.
I think I see why repeat<1,2> works for yyyy, but AFAICT, the repeat<1,3> "worked" for day dd and month mm, which seems off. I changed the sample code "just to see what would happen" and was scratch-my-head-surprised to get the same results from repeat<1,2> as for repeat<1,3> .... days and months were "captured".
repeat<1,3>(_d) will match one digit, or two digits, or three digits. So, yes, it will match days (which are one or two digits) or months (which are one or two digits). However, it is overly permissive, because it will also match three digits, which is not a valid day or month. -- Eric Niebler Boost Consulting www.boost-consulting.com
repeat<1,3>(_d) will match one digit, or two digits, or three digits. So, yes, it will match days (which are one or two digits) or months (which are one or two digits). However, it is overly permissive, because it will also match three digits, which is not a valid day or month.
Sorry ... VERY red-face on this regex newbie. You are very gracious.
participants (2)
-
Eric Niebler
-
Lynn Allan