Tokenizer doc problems/confusion (?)

1. http://www.boost.org/libs/tokenizer/tokenizer.htm says: template < class TokenizerFunc = char_delimiters_separator<char>, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ class Iterator = std::string::const_iterator, class Type = std::string > class tokenizer Yet char_delimiters_separator is officially deprecated. Is that really intentional? Wow, it appears to be using the deprecated class template for the default! Now, I wanted to tokenize an input stream, without putting it in a string first. It seems to be much harder than neccessary: #include <map> #include <string> #include <iostream> #include <boost/tokenizer.hpp> #include <boost/lambda/lambda.hpp> #include <iterator> int main() { typedef std::map<std::string, unsigned> fmap; // Seems awfully complicated boost::tokenizer< boost::char_delimiters_separator<char>, std::istreambuf_iterator<char> > t( (std::istreambuf_iterator<char>(std::cin)) , std::istreambuf_iterator<char>() ); fmap f; std::string s; using namespace boost::lambda; std::for_each(t.begin(), t.end(), ++var(f)[_1]); for (fmap::iterator p = f.begin(), e = f.end(); p != e; ++p) std::cout << p->second << ": " << p->first << "\n"; } I can think of lots of ways to simplify the interface, most of which center on eliminating redundant mentions of istreambuf_iterator<char>. When I throw the following text at it: ------ how much wood could a woodchuck chuck, if a woodchuck could chuck wood? ------ I get: 2: a 2: chuck 2: could 1: how 1: if 1: much 2: wood 2: woodchuck as desired. But if I replace char_delimiters_separator with char_separator, I get: 15: What's up with that?? Even if char_separator did what it was advertised to (and it's not clear that it does), it wouldn't give me the simple "find the words functionality" of char_delimiters_separator... so I'm baffled by the deprecation. 2. http://www.boost.org/libs/tokenizer/char_separator.htm says: explicit char_separator() The function std::isspace() is used to identify dropped delimiters and std::ispunct() is used to identify kept delimiters. In addition, empty tokens are dropped. which seems strange in light of the fact that there's no ctor taking _functions_ to be used to determine kept/dropped delimiters, and nowhere in the text do you indicate that functions are called internally. Help? Thanks, -- Dave Abrahams Boost Consulting www.boost-consulting.com
participants (1)
-
David Abrahams