
On Thu, Feb 10, 2011 at 9:32 AM, Max <more4less@sina.com> wrote: [Stephan T. Lavavej <stl@exchange.microsoft.com> wrote:]
[Max]
The part I could not interpret is: ^|[\s,] And $|[\s,]
The docs say:
A '^' character shall match the start of a line. A '$' character shall match the end of a line.
Yes, I'm aware of this. But even with this in mind, I cannot interpret "^|[\s,]" and "$|[\s,]". For the former, I know '|' means alteration, but how can it be after '^'? For the latter, how can "|[\s,]" be expected after the end of a line (and the same confusion as above)?
^|[\s,] means _either_ the beginning of the line _or_ a space or comma. In other words the field starts either at the beginning of the line or after a space or comma. Likewise $|[\s,] The field ends either at the end of the line or before a space or comma.
One more question - with you code, any empty 'token' between two contiguous ',' is ignored, what if someday I'd like to pick them up?
"([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$ I'm presuming an empty line should count as no tokens; if you don't mind an empty line being one token it can be simplified to "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,) Not really that much simpler. Yechezkel Mett