New subject: Tokenizer doc problems/confusion (?)

7 Apr 2004

      "John R. Bandela" <jbandela@ufl.edu> writes:
...
Thanks for the e-mail.
The problem the char_separators not giving you the correct words and
counts is a bug. It was caused by the latest changes to
token_functions to speed up tokenizing non-input iterators. The
version in the boost 1.31 should work (it is the one prior to the
changes). I have also just fixed it in the CVS.
Thanks!  All the docs still show char_delimiters_separator as the
default, though.
...
Having char_delimiters_separator as the default tokenizing function
was unintentional (ie it never got changed).
OK.
...
As to simplifying the interface, I would really like to hear your ideas.
1. Allow "boost::use_default" for any of the template parameters

2. Supply a templated 1-argument ctor that constructs the start
   iterator from the argument and default constructs the end iterator

then I could write:

       boost::tokenizer<
           boost::use_default,
           std::istreambuf_iterator<char> 
       > t(std::cin);
...
As to why char_delimiters_separator was deprecated, char_delimiters
                                                      ^^^^^^^^^^^^^^^
What's that?  It's not in the doc.  Do you mean char_separator?
...
was supposed to be a replacement for it, doing everything it did,
but providing the user with more control in dealing with empty
tokens. However, as you have brought up, it also has the unfortunate
problem that in its default construction of returning punctuation as
well as words and thus is not as simple to use.
Why not just change that.
...
Perhaps the solution would be making a separate words_delimiter that
is hard-wired to return only words (ie characters separated by
either a space or a punctuation mark) and making that the default.
That'd be OK with me.
...
As to why the mention of std::isspace and ispunct, I needed a default
behavior when the user does not want to provide all the space and
punctuation (it can be a somewhat long string). I also did not what to
hard-code it as a string so I used those functions instead.
That doesn't demystify things for me.  If you have an internal way to
use functions to determine kept and dropped, why don't you give me a
way to supply functions, too?

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

Re: Tokenizer doc problems/confusion (?)

David Abrahams

Robert Zeh

David Abrahams

Robert Zeh

David Abrahams

tags

participants (2)