Hi, I want to tokenize a string based on one or multiple space or tab std::string s = " a b c d \te "; std::vector<std::string> tokens; boost::algorithm::split(tokens,s,boost::algorithm::is_any_of(" \t"),token_compress_on); copy(tokens.begin(),tokens.end(),ostream_iterator<string>(cout,"\n")); This gives more or less correct result, except it does not remove space/ tab from begin /end. It can be done like trim(s); and then call the above. But is there any other way it can be done in split itself ?
Hi, I use boost::tokenizer for this problem. Example int split(std::vector<std::string>& list, const std::string& eingabe, const std::string& delims, bool keep_empty) { list.clear(); boost::empty_token_policy empty_tokens; if( keep_empty) empty_tokens = boost::keep_empty_tokens; else empty_tokens = boost::drop_empty_tokens; boost::char_separator<char> sep(delims.c_str(), "", empty_tokens); typedef boost::tokenizer<boost::char_separator<char> > tokenizer; tokenizer tokens(eingabe, sep); tokenizer::iterator tok_iter = tokens.begin(); for (; tok_iter != tokens.end(); ++tok_iter) list.push_back(*tok_iter); return list.size(); } //--------------------------------------------------------------------------- abir basak schrieb:
Hi, I want to tokenize a string based on one or multiple space or tab
std::string s = " a b c d \te "; std::vector<std::string> tokens; boost::algorithm::split(tokens,s,boost::algorithm::is_any_of(" \t"),token_compress_on); copy(tokens.begin(),tokens.end(),ostream_iterator<string>(cout,"\n"));
This gives more or less correct result, except it does not remove space/ tab from begin /end. It can be done like trim(s); and then call the above. But is there any other way it can be done in split itself ?
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Heiko Fechner wrote:
Hi,
I use boost::tokenizer for this problem. Example
int split(std::vector<std::string>& list, const std::string& eingabe, const std::string& delims, bool keep_empty) { list.clear(); boost::empty_token_policy empty_tokens; if( keep_empty) empty_tokens = boost::keep_empty_tokens; else empty_tokens = boost::drop_empty_tokens; boost::char_separator<char> sep(delims.c_str(), "", empty_tokens); typedef boost::tokenizer<boost::char_separator<char> > tokenizer; tokenizer tokens(eingabe, sep); tokenizer::iterator tok_iter = tokens.begin(); for (; tok_iter != tokens.end(); ++tok_iter) list.push_back(*tok_iter); return list.size(); } //---------------------------------------------------------------------------
abir basak schrieb:
Hi, I want to tokenize a string based on one or multiple space or tab
std::string s = " a b c d \te "; std::vector<std::string> tokens; boost::algorithm::split(tokens,s,boost::algorithm::is_any_of(" \t"),token_compress_on); copy(tokens.begin(),tokens.end(),ostream_iterator<string>(cout,"\n"));
This gives more or less correct result, except it does not remove space/ tab from begin /end. It can be done like trim(s); and then call the above. But is there any other way it can be done in split itself ?
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Thanks. I also used boost::tokenizer<> a few times, which doesn't have those problems. But it doesn't pre-tokenize the string. I use it when only I need the first token (previously I used >> operator to check the first token, and then getline to eat rest of the line when the first token is not of my interest . But someone said that is a no - no :( ) In this case I need to know how many tokens. Thus split is more handy here. Otherwise i need to run the iterator once before accessing the data, just like the one u had shown. -- Abir Basak, Member IEEE Software Engineer, Read Ink Technologies B. Tech, IIT Kharagpur email: abir@abirbasak.com homepage: www.abirbasak.com
Hi, abir basak wrote:
Hi, I want to tokenize a string based on one or multiple space or tab
std::string s = " a b c d \te "; std::vector<std::string> tokens; boost::algorithm::split(tokens,s,boost::algorithm::is_any_of(" \t"),token_compress_on); copy(tokens.begin(),tokens.end(),ostream_iterator<string>(cout,"\n"));
This gives more or less correct result, except it does not remove space/ tab from begin /end.
There is a reason to this. When the split bahaves this way, it si possible to recreate a original string. Also, there are situations when the empty token at the begin/end has a meaning.
It can be done like trim(s); and then call the above.
This is a standard solution.
But is there any other way it can be done in split itself ?
No, this behaviour is intentional. Best Regards, Pavol.
participants (3)
-
abir basak
-
Heiko Fechner
-
Pavol Droba