Re: [boost] [Tokenizer]Usage and documentation

13 Feb 2011

      On Thu, Feb 10, 2011 at 3:46 PM, Max <more4less@sina.com> wrote:
...
I have 3 version of the RE's sitting side by side attempting to figure out
the difference
between them.
...
"([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$                  // (1)
"([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,)                              //
(2)
"([^"]*)"|([^\s,"]+)                                                  //
(3) original version offered by Stephen
But, unfortunately, I still cannot fully grasp the meaning of (1) and (2).
,\s*(),

means find a ',' followed by any number of spaces followed by a ','
and capture an empty string.

The others are similar.
...
r: "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$
empty,,,fields, , , like this
[empty][][fields][][like][this]
,,,
[][]
There are 2 empty tokens in between each 3 contiguous ',' but only one for
each is detected.
Yes, that's a mistake. When matching ,, as an empty field the second
',' is eaten and can no longer be used as the beginning of the next
field.

"([^"]*)"|([^\s,"]+)|,\s*()(?=,)|^\s*()(?=,)|,\s*()$

should work. (?=) is a lookahead, it checks that the pattern (',' in
this case) matches at this point, but doesn't eat any input.
...
Likewise, for (2), I get:
r: "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,)
empty,,,fields, , , like this
[empty][fields][like][this]
This time, the behavior is no different than the 'original' version.
I get the same results as the first version. Perhaps it wasn't escaped properly?

Yechezkel Mett