Tokenizer doc problems/confusion (?)

7 Apr 2004

      1. http://www.boost.org/libs/tokenizer/tokenizer.htm says:

    template <
        class TokenizerFunc = char_delimiters_separator<char>, 
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        class Iterator = std::string::const_iterator,
        class Type = std::string
      >
      class tokenizer

    Yet char_delimiters_separator is officially deprecated.  Is that
    really intentional?  Wow, it appears to be using the deprecated
    class template for the default!

    Now, I wanted to tokenize an input stream, without putting it in a
    string first.  It seems to be much harder than neccessary:

      #include <map>
      #include <string>
      #include <iostream>
      #include <boost/tokenizer.hpp>
      #include <boost/lambda/lambda.hpp>
      #include <iterator>

      int main()
      {
          typedef std::map<std::string, unsigned> fmap;

          // Seems awfully complicated
          boost::tokenizer<
              boost::char_delimiters_separator<char>,
              std::istreambuf_iterator<char>
          > t(
              (std::istreambuf_iterator<char>(std::cin))
             , std::istreambuf_iterator<char>()
          );

          fmap f;
          std::string s;

          using namespace boost::lambda;

          std::for_each(t.begin(), t.end(), ++var(f)[_1]);

          for (fmap::iterator p = f.begin(), e = f.end(); p != e; ++p)
              std::cout << p->second << ": " << p->first << "\n";
      }

    I can think of lots of ways to simplify the interface, most of
    which center on eliminating redundant mentions of
    istreambuf_iterator<char>.

    When I throw the following text at it:
------
how much wood could a woodchuck chuck, 
if a woodchuck could chuck wood?
------
    I get:

        2: a
        2: chuck
        2: could
        1: how
        1: if
        1: much
        2: wood
        2: woodchuck

    as desired.  But if I replace char_delimiters_separator with
    char_separator, I get:

        15: 

    What's up with that??

    Even if char_separator did what it was advertised to (and it's
    not clear that it does), it wouldn't give me the simple "find the
    words functionality" of char_delimiters_separator... so I'm
    baffled by the deprecation.

2. http://www.boost.org/libs/tokenizer/char_separator.htm says:

      explicit char_separator()

      The function std::isspace() is used to identify dropped
      delimiters and std::ispunct() is used to identify kept
      delimiters. In addition, empty tokens are dropped.

   which seems strange in light of the fact that there's no ctor
   taking _functions_ to be used to determine kept/dropped
   delimiters, and nowhere in the text do you indicate that functions
   are called internally.  

Help?

Thanks,
-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

David Abrahams

tags

participants (1)