Re: [boost] [regex and string algo] again strange split behaviour

14 Jul 2005


      On Thu, 14 Jul 2005 13:32:15 +0100, "John Maddock"
<john@johnmaddock.co.uk> wrote:

[boost.regex dropping last empty token]
...
The original rational was "do the same thing as perl", for example:
perl -e "print join(':', split(/;/, '')) .\"\\n\". join(':', split(/;/, 
';')) .\"\\n\". join(':', split(/;/, '1;2')) .\"\\n\". join(':', split(/;/, 
'1;2;')) .\"\\n\". join(':', split(/;/, ';1;2;'))"
Outputs:
1:2
1:2
:1:2
Note no trailing blank fields, the Perl manual says:
"      split /PATTERN/,EXPR,LIMIT
      split /PATTERN/,EXPR
      split /PATTERN/
      split   Splits a string into a list of strings and returns that list.
              By default, empty leading fields are preserved, and empty
              trailing ones are deleted."
But if I'm not mistaken, you're not really doing the same thing. Perl
drops _all_ empty trailing fields, and from the Boost.Regex description
it looks like you are only dropping the very last one. Perl also has the
option of keeping all empty trailing fields by using a negative number
for LIMIT, as you mentioned.
...
It also kind of makes sense to me: if you want to split on a delimiter, then 
a trailing delimiter does not normally mean you want a trailing blank field: 
indeed trailing delimiters are quite commonly used (think C++ array syntax 
as one example).
I can't speak for everyone else, but I can say that in many of my splits
I would want the last empty field to be retained. I'm parsing
comma/tab/semicolon-separated log lines, CSV files, custom protocols,
and other things where the last field is important, empty or not. An
empty field is still valid data, and the field count in my cases can
determine how I need to parse the data. (For keeping compatibility with
old log file formats, for instance.)

-- 
Be seeing you.

Re: [boost] [regex and string algo] again strange split behaviour

Thore Karlsen