
I have used my own C++ tokenizer in the past, but I would like to use Boost's instead. The predominant use of tokenizing for me is to split on white space, but Boost's default is to use white space AND punctuation. Is there any possibility to have either the default changed, or another TokenizerFunction added such as ws_separator, or something similar? I know I can use boost::char_separator<char> sep(" \n\t"); (but do I need to add "\v" to the char set?) but I would rather have something like boost::ws_separator sep; and, better, make the ws_separator be the default TokenizerFunction for tokenizer. Thanks for listening. Tom Browder

On Sat, 11 Jun 2005 12:52:56 -0500, "Tom Browder"
I have used my own C++ tokenizer in the past, but I would like to use Boost's instead.
The predominant use of tokenizing for me is to split on white space, but Boost's default is to use white space AND punctuation. Is there any possibility to have either the default changed, or another TokenizerFunction added such as ws_separator, or something similar?
I know I can use
boost::char_separator<char> sep(" \n\t");
(but do I need to add "\v" to the char set?)
but I would rather have something like
boost::ws_separator sep;
and, better, make the ws_separator be the default TokenizerFunction for tokenizer.
Have you looked at the string_algo library? I much prefer its split functionality to the tokenizer library, and what you want here is very easy to accomplish with it. Example: vector<string> v; split(v, "split me into tokens", is_space(), token_compress_on); You should really check this library out. It's got a ton of useful stuff. -- Be seeing you.

From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Thore Karlsen
Have you looked at the string_algo library? I much prefer its split functionality to the tokenizer library, and what you want here is very easy to accomplish with it.
No, but I will--thanks for the tip. -Tom
participants (2)
-
Thore Karlsen
-
Tom Browder