Re: Tokenizer doc problems/confusion (?)

"John R. Bandela" <jbandela@ufl.edu> writes:
Thanks for the e-mail.
The problem the char_separators not giving you the correct words and counts is a bug. It was caused by the latest changes to token_functions to speed up tokenizing non-input iterators. The version in the boost 1.31 should work (it is the one prior to the changes). I have also just fixed it in the CVS.
Thanks! All the docs still show char_delimiters_separator as the default, though.
Having char_delimiters_separator as the default tokenizing function was unintentional (ie it never got changed).
OK.
As to simplifying the interface, I would really like to hear your ideas.
1. Allow "boost::use_default" for any of the template parameters 2. Supply a templated 1-argument ctor that constructs the start iterator from the argument and default constructs the end iterator then I could write: boost::tokenizer< boost::use_default, std::istreambuf_iterator<char> > t(std::cin);
As to why char_delimiters_separator was deprecated, char_delimiters ^^^^^^^^^^^^^^^
What's that? It's not in the doc. Do you mean char_separator?
was supposed to be a replacement for it, doing everything it did, but providing the user with more control in dealing with empty tokens. However, as you have brought up, it also has the unfortunate problem that in its default construction of returning punctuation as well as words and thus is not as simple to use.
Why not just change that.
Perhaps the solution would be making a separate words_delimiter that is hard-wired to return only words (ie characters separated by either a space or a punctuation mark) and making that the default.
That'd be OK with me.
As to why the mention of std::isspace and ispunct, I needed a default behavior when the user does not want to provide all the space and punctuation (it can be a somewhat long string). I also did not what to hard-code it as a string so I used those functions instead.
That doesn't demystify things for me. If you have an internal way to use functions to determine kept and dropped, why don't you give me a way to supply functions, too? -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams <dave@boost-consulting.com> writes:
"John R. Bandela" <jbandela@ufl.edu> writes:
Thanks for the e-mail.
The problem the char_separators not giving you the correct words and counts is a bug. It was caused by the latest changes to token_functions to speed up tokenizing non-input iterators. The version in the boost 1.31 should work (it is the one prior to the changes). I have also just fixed it in the CVS.
Thanks! All the docs still show char_delimiters_separator as the default, though.
Should we consider creating some regression tests for the tokenizer? Robert

Robert Zeh <razeh@archelon-us.com> writes:
David Abrahams <dave@boost-consulting.com> writes:
"John R. Bandela" <jbandela@ufl.edu> writes:
Thanks for the e-mail.
The problem the char_separators not giving you the correct words and counts is a bug. It was caused by the latest changes to token_functions to speed up tokenizing non-input iterators. The version in the boost 1.31 should work (it is the one prior to the changes). I have also just fixed it in the CVS.
Thanks! All the docs still show char_delimiters_separator as the default, though.
Should we consider creating some regression tests for the tokenizer?
In status/Jamfile: test-suite tokenizer : [ run libs/tokenizer/examples.cpp <lib>../libs/test/build/boost_test_exec_monitor ] [ run libs/tokenizer/simple_example_1.cpp ] [ run libs/tokenizer/simple_example_2.cpp ] [ run libs/tokenizer/simple_example_3.cpp ] [ run libs/tokenizer/simple_example_4.cpp ] [ run libs/tokenizer/simple_example_5.cpp ] That first one, at any rate, is actually a regression test AFAICT. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams <dave@boost-consulting.com> writes:
In status/Jamfile:
test-suite tokenizer : [ run libs/tokenizer/examples.cpp <lib>../libs/test/build/boost_test_exec_monitor ] [ run libs/tokenizer/simple_example_1.cpp ] [ run libs/tokenizer/simple_example_2.cpp ] [ run libs/tokenizer/simple_example_3.cpp ] [ run libs/tokenizer/simple_example_4.cpp ] [ run libs/tokenizer/simple_example_5.cpp ]
That first one, at any rate, is actually a regression test AFAICT.
Could we add your example? Robert

Robert Zeh <razeh@archelon-us.com> writes:
David Abrahams <dave@boost-consulting.com> writes:
In status/Jamfile:
test-suite tokenizer : [ run libs/tokenizer/examples.cpp <lib>../libs/test/build/boost_test_exec_monitor ] [ run libs/tokenizer/simple_example_1.cpp ] [ run libs/tokenizer/simple_example_2.cpp ] [ run libs/tokenizer/simple_example_3.cpp ] [ run libs/tokenizer/simple_example_4.cpp ] [ run libs/tokenizer/simple_example_5.cpp ]
That first one, at any rate, is actually a regression test AFAICT.
Could we add your example?
If "we" means the maintainer(s) of tokenizer, then be my guest. I hereby relinquish all rights to the example and place it in the public domain. -- Dave Abrahams Boost Consulting www.boost-consulting.com
participants (2)
-
David Abrahams
-
Robert Zeh