
From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Yechezkel Mett Sent: Thursday, February 10, 2011 5:41 PM To: boost@lists.boost.org Subject: Re: [boost] [Tokenizer]Usage and documentation
^|[\s,]
means _either_ the beginning of the line _or_ a space or comma. In other words the field starts either at the beginning of the line or after a space or comma.
Likewise
$|[\s,]
The field ends either at the end of the line or before a space or comma.
I indeed never realized that ^ and $ could be used in combination with | in that way before. I didn't use RE that frequently though.
One more question - with you code, any empty 'token' between two contiguous ',' is ignored, what if someday I'd like to pick them up?
"([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$
I'm presuming an empty line should count as no tokens; if you don't mind an empty line being one token it can be simplified to
I have 3 version of the RE's sitting side by side attempting to figure out the difference between them.
"([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$ // (1) "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,) // (2) "([^"]*)"|([^\s,"]+) // (3) original version offered by Stephen
But, unfortunately, I still cannot fully grasp the meaning of (1) and (2). But by testing (1) with Stephen's code, I get: r: "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$ empty,,,fields, , , like this [empty][][fields][][like][this] ,,, [][] There are 2 empty tokens in between each 3 contiguous ',' but only one for each is detected. Likewise, for (2), I get: r: "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,) empty,,,fields, , , like this [empty][fields][like][this] This time, the behavior is no different than the 'original' version. Thank you Yechezkel for you help. BTW, it seems like by reading http://www.boost.org/doc/libs/1_45_0/libs/regex/doc/html/boost_regex/syntax/ perl_syntax.html I cannot get a full view of the regex grammar. Maybe I need a whole book on it? :-) Is there any *complete* introduction available on the net? B/Rgds Max