
On Sat, 11 Jun 2005 12:52:56 -0500, "Tom Browder"
I have used my own C++ tokenizer in the past, but I would like to use Boost's instead.
The predominant use of tokenizing for me is to split on white space, but Boost's default is to use white space AND punctuation. Is there any possibility to have either the default changed, or another TokenizerFunction added such as ws_separator, or something similar?
I know I can use
boost::char_separator<char> sep(" \n\t");
(but do I need to add "\v" to the char set?)
but I would rather have something like
boost::ws_separator sep;
and, better, make the ws_separator be the default TokenizerFunction for tokenizer.
Have you looked at the string_algo library? I much prefer its split functionality to the tokenizer library, and what you want here is very easy to accomplish with it. Example: vector<string> v; split(v, "split me into tokens", is_space(), token_compress_on); You should really check this library out. It's got a ton of useful stuff. -- Be seeing you.