
On Thu, 14 Jul 2005 13:32:15 +0100, "John Maddock" <john@johnmaddock.co.uk> wrote: [boost.regex dropping last empty token]
The original rational was "do the same thing as perl", for example:
perl -e "print join(':', split(/;/, '')) .\"\\n\". join(':', split(/;/, ';')) .\"\\n\". join(':', split(/;/, '1;2')) .\"\\n\". join(':', split(/;/, '1;2;')) .\"\\n\". join(':', split(/;/, ';1;2;'))"
Outputs:
1:2 1:2 :1:2
Note no trailing blank fields, the Perl manual says:
" split /PATTERN/,EXPR,LIMIT split /PATTERN/,EXPR split /PATTERN/ split Splits a string into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted."
But if I'm not mistaken, you're not really doing the same thing. Perl drops _all_ empty trailing fields, and from the Boost.Regex description it looks like you are only dropping the very last one. Perl also has the option of keeping all empty trailing fields by using a negative number for LIMIT, as you mentioned.
It also kind of makes sense to me: if you want to split on a delimiter, then a trailing delimiter does not normally mean you want a trailing blank field: indeed trailing delimiters are quite commonly used (think C++ array syntax as one example).
I can't speak for everyone else, but I can say that in many of my splits I would want the last empty field to be retained. I'm parsing comma/tab/semicolon-separated log lines, CSV files, custom protocols, and other things where the last field is important, empty or not. An empty field is still valid data, and the field count in my cases can determine how I need to parse the data. (For keeping compatibility with old log file formats, for instance.) -- Be seeing you.