
On Thu, Feb 10, 2011 at 3:46 PM, Max <more4less@sina.com> wrote:
I have 3 version of the RE's sitting side by side attempting to figure out the difference between them.
"([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$ // (1) "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,) // (2) "([^"]*)"|([^\s,"]+) // (3) original version offered by Stephen
But, unfortunately, I still cannot fully grasp the meaning of (1) and (2).
,\s*(), means find a ',' followed by any number of spaces followed by a ',' and capture an empty string. The others are similar.
r: "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$
empty,,,fields, , , like this [empty][][fields][][like][this] ,,, [][]
There are 2 empty tokens in between each 3 contiguous ',' but only one for each is detected.
Yes, that's a mistake. When matching ,, as an empty field the second ',' is eaten and can no longer be used as the beginning of the next field. "([^"]*)"|([^\s,"]+)|,\s*()(?=,)|^\s*()(?=,)|,\s*()$ should work. (?=) is a lookahead, it checks that the pattern (',' in this case) matches at this point, but doesn't eat any input.
Likewise, for (2), I get:
r: "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,)
empty,,,fields, , , like this [empty][fields][like][this]
This time, the behavior is no different than the 'original' version.
I get the same results as the first version. Perhaps it wasn't escaped properly? Yechezkel Mett