
Hi, I used the split function on the following string: vector<string> tokens; string str= "( 448 448 64 ) ( 448 0 64 ) ( 0 448 64 ) name 0 0 0 0.5 0.5 0 0 0"; split( tokens, str, is_any_of( ()), token_compress_on ); // I pass 'space' , '(' and ')' into is_any_of() I was supposed to get 18 tokens. The first nine values, the string "name" and the remaining eight digits. "split" always returns a string vector containing 19 tokens, where the first element of the vector is an empty string. Why does the function insert empty strings into the collection? Can I use split to obtain the 18 wanted tokens? -Dirk

Hello, Saturday, October 16, 2004, 5:38:31 PM, you wrote:
I used the split function on the following string:
vector<string> tokens; string str= "( 448 448 64 ) ( 448 0 64 ) ( 0 448 64 ) name 0 0 0 0.5 0.5 0 0 0"; split( tokens, str, is_any_of( ()), token_compress_on ); // I pass 'space' , '(' and ')' into is_any_of()
I was supposed to get 18 tokens. The first nine values, the string "name" and the remaining eight digits. "split" always returns a string vector containing 19 tokens, where the first element of the vector is an empty string. Why does the function insert empty strings into the collection? Can I use split to obtain the 18 wanted tokens?
Slip is designed to not ingnore any token. Imagine that you need to parse comma delimited string. Even an empty string can be a valid. So this is the reason why your result starts with empty string. It's because your input starts with a separator. This might seem strange in your case, but I think, it's better to put one more empty string, that the remove it when it will be needed. What you can do is simply to trim separators from the sequence before splitting like trim_if(str, is_any_of(")( ") Regards, Pavol

On Sat, 16 Oct 2004 22:23:15 +0200, Pavol Droba <droba@topmail.sk> wrote:
I used the split function on the following string:
vector<string> tokens; string str= "( 448 448 64 ) ( 448 0 64 ) ( 0 448 64 ) name 0 0 0 0.5 0.5 0 0 0"; split( tokens, str, is_any_of( ()), token_compress_on ); // I pass 'space' , '(' and ')' into is_any_of()
I was supposed to get 18 tokens. The first nine values, the string "name" and the remaining eight digits. "split" always returns a string vector containing 19 tokens, where the first element of the vector is an empty string. Why does the function insert empty strings into the collection? Can I use split to obtain the 18 wanted tokens?
Slip is designed to not ingnore any token. Imagine that you need to parse comma delimited string. Even an empty string can be a valid. So this is the reason why your result starts with empty string. It's because your input starts with a separator.
Here is a case that doesn't seem to behave properly: Input ending with a separator. E.g.: string s = ",a,"; vector<string> tokens; split(tokens, s, is_punct(), token_compress_off); This results in a vector containing "" and "a", but not the final "". This asymmetrical behavior feels like a bug to me. Any thoughts? -- Be seeing you.

On Fri, Jan 28, 2005 at 05:40:52PM -0600, Thore Karlsen wrote:
On Sat, 16 Oct 2004 22:23:15 +0200, Pavol Droba <droba@topmail.sk> wrote:
I used the split function on the following string:
vector<string> tokens; string str= "( 448 448 64 ) ( 448 0 64 ) ( 0 448 64 ) name 0 0 0 0.5 0.5 0 0 0"; split( tokens, str, is_any_of( ()), token_compress_on ); // I pass 'space' , '(' and ')' into is_any_of()
I was supposed to get 18 tokens. The first nine values, the string "name" and the remaining eight digits. "split" always returns a string vector containing 19 tokens, where the first element of the vector is an empty string. Why does the function insert empty strings into the collection? Can I use split to obtain the 18 wanted tokens?
Slip is designed to not ingnore any token. Imagine that you need to parse comma delimited string. Even an empty string can be a valid. So this is the reason why your result starts with empty string. It's because your input starts with a separator.
Here is a case that doesn't seem to behave properly: Input ending with a separator. E.g.:
string s = ",a,"; vector<string> tokens; split(tokens, s, is_punct(), token_compress_off);
This results in a vector containing "" and "a", but not the final "".
This asymmetrical behavior feels like a bug to me. Any thoughts?
Hmm, your reasoning seem logical. The behaviour should not be asymmetric. Now the question is which way to go. If it is better to include trailing part, or to remove the leading one. I think, that including the trailing part is better. I will see how to fix it. Thanks, Pavol

On Sat, 29 Jan 2005 13:22:36 +0100, Pavol Droba <droba@topmail.sk> wrote:
Slip is designed to not ingnore any token. Imagine that you need to parse comma delimited string. Even an empty string can be a valid. So this is the reason why your result starts with empty string. It's because your input starts with a separator.
Here is a case that doesn't seem to behave properly: Input ending with a separator. E.g.:
string s = ",a,"; vector<string> tokens; split(tokens, s, is_punct(), token_compress_off);
This results in a vector containing "" and "a", but not the final "".
This asymmetrical behavior feels like a bug to me. Any thoughts?
Hmm, your reasoning seem logical. The behaviour should not be asymmetric. Now the question is which way to go. If it is better to include trailing part, or to remove the leading one.
I think, that including the trailing part is better. I will see how to fix it.
Sounds good. I also think it is better to include the trailing part. That's how I would expect it to behave. It is also easier to trim the string before splitting if they are not wanted, than to manually check for separators at each end of the string and inserting the blank tokens yourself. However, I wonder if it would be easy or logical to add another token_compress variation. I often find myself in the same situation as the original poster, where I simply don't care about empty tokens. For example, reading a line of text from a file and extracting words by splitting on whitespace. If the line started with a tab, that would give an empty word if I didn't trim it first. Trimming is easy enough, but it's extra overhead, and not quite as convenient. The library looks great, by the way. I originally wrote my own library with similar functionality, but this is much more complete and flexible, and it looks like I can throw away my own library now. :) -- Be seeing you.
participants (3)
-
Dirk Gregorius
-
Pavol Droba
-
Thore Karlsen