
Apologies, I'm still catching up with this thread.
!!! "provided that this would not be an empty string" !!!
Correct.
How about this string: "/abc/abc". Would this result in "", "abc", "abc"? Yes
Yet "abc/abc/" would result in "abc", "abc"? Yes
That seems terribly unbalanced to me, and this is not the behavior I would expect. Yes, you may have a point here.
Or is it somewhat modeled after the C++ initializer syntax: { a, b, } is the same as { a, b } but { , a, b } isn't the same ...
Maybe John can commence?
The original rational was "do the same thing as perl", for example: perl -e "print join(':', split(/;/, '')) .\"\\n\". join(':', split(/;/, ';')) .\"\\n\". join(':', split(/;/, '1;2')) .\"\\n\". join(':', split(/;/, '1;2;')) .\"\\n\". join(':', split(/;/, ';1;2;'))" Outputs: 1:2 1:2 :1:2 Note no trailing blank fields, the Perl manual says: " split /PATTERN/,EXPR,LIMIT split /PATTERN/,EXPR split /PATTERN/ split Splits a string into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted." It also kind of makes sense to me: if you want to split on a delimiter, then a trailing delimiter does not normally mean you want a trailing blank field: indeed trailing delimiters are quite commonly used (think C++ array syntax as one example). I believe in Perl you can get the empty trailing field if you specify an arbitrarily large argument as the split field limit. As far as Boost.Regex is concerned, regex_token_iterator could be used to get either behaviour given either definition as the starting point with equivalent ease: Given tyedef boost::regex_token_iterator< ...args... > iterator_type; iterator_type i( ...args... ); Then given the current behaviour (stripping a trailing empty field not followed by a delimiter): We know that a trailing field has been stripped if: (i++ == iterator_type()) && (i->second != end_of_string_sequence) Alternatively, if trailing empty fields were to be preserved, then we could spot them when they happen with: (i++ == iterator_type()) && (i->first == i->second) So for me, the question is which behaviour is more commonly required? At present I can't think of any real world use cases where a trailing empty field would be important, so here's the challenge: can anyone think of a file format, or transmission format or command line syntax or whatever where the trailing field is actually required? Real world cases only please, but first two data points: CSV files, and the Unicode character database don't require the output of trailing blanks (and parsing of the latter would certainly break if they were considered). One more very unscientific data point: historically this has always been the behaviour of regex_split (now deprecated), and its replacement regex_token_iterator, and no one has ever complained: until now that is! :-) Still sticking to my guns for now.... John.