
Hi, For long time in my daytime projects I was using my class token_iterator designed based on old iterator_adaptor design (adopted for old compilers). Now when need arose for such functionality in Boost.Test. I looked in direction of boost::token_iterator. After some struggle I end up adopting my version to new design. Here some of comments and issues that made me opt so. 1. token_iterator follows in a sense a Policy-based design (PBD). Where TokenizerFunc is a Policy that implements most of the functionality. The problem here IMO (unfortunately you see it now quite frequently), that I do not see a reason to use such a design. PBD makes sense it your generic component do a lot of work, provide common interface, combine several independent policies. In this case most what token_iterator does is just a forwarding (it become especially clear if you switch to use input_iterator_adaptor). It wouldn't be that bad, but PBD in this case causes significant usability inconvenience. Instead of plain and simple: token_iterator it( "a:b", ":" ); I have to write something like: token_iterator<char_delimiters_separator<char> > it( char_delimiters_separator<char>( ":" ), "a:b" ). All of that just because we have separate notion of Tokenizer function. IMO it too big price to pay. My suggestion is to get rid of the TokenizerFunction notion. Just implement many different iterators with the same scope of interest: tokenization. We will need to find proper names for iterators and instead of TokenizerFunc model description we may have page explaining how to write custom token iterators (if necessary, which I doubt). 2. token_iterator deals with begin/end iterators. I understand that in some rare cases there is need to iterate over range of iterators (BTW, std::input_iterator gonna be bad example - did you ever tried to use token_iterator over it?). But in most case we want to tokenize a string. And in most cases it either std::string or [char|wchar_t] const*. The fact that Iterator is a template parameter makes token_iterator usage quite inconvenient. In most cases the only template parameter that is required is IMO character type. My suggestion: lets have 2 different types if token iterators: one to iterate over string and separate to iterate over range of iterators. 3. I found it confusing in many cases how delimiters gets specified in token_iterator: token_iterator( "...", ":", " " ) Which string represent dropped delimiters, which kept? And what if I just want empty tokens? I need to write token_iterator( "...", "", "", keep_empty_tokens ) What are these two empty strings doing here?My suggestion is to use named parameters scheme. Like this: token_iterator( "...", dropped_delimiters = ":", kept_delimiters = " " ); 4. 2 Tokenizer functions use Traits as *policy* parameter. What actually is required in Compare predicate, that may or may not be implemented using char_traits. While std::basic_string references inside tokenizer function implementation should refer to traits by name std::char_traits 5. What if I want to use isspace as kept delimiters policy? There is no way to specify that. IMO delimiter policy should be separated (as runtime policy). 6. You use iterator_category as a selector of assigner. It's better to use traversal instead 7. In many places you use #if !defined(BOOST_MSVC) || BOOST_MSVC > 1300 typename #endif use BOOST_DEDUCED_TYPENAME instead 8. What is the purpose of token_iterator_generator. I do not see a need for one with new iterator facade design. 9. Why would we need all the access methods? Iterator base()const{return begin_;} Iterator end()const{return end_;}; TokenizerFunc tokenizer_function()const{return f_;} Type current_token()const{return tok_;} bool at_end()const{return valid_;} I couldn't imagine why do we need them. 10 Token iterator asserts in attempts to dereference or iterate end iterator. I don't believe it's correct. In many cases (mostly in design that assumes some optional tokens in the end) it's much more convenient this does not happened. My suggestion is to return empty token on dereference and stay quite on iteration. 11. In simpler design there won't be need in template copy constructor for token_iterator 12. Documentation need to be updated IMO, cause even as things stands now it dies not represent the best practice moved from type generators usage. 13. Files under libs/tokenizer lacks any structure required by boost standards. 14. There is no real test suite only some small examples supplied. 15. Why escaped_list_separator was named this way. It seems strange to emphasize in a name the fact that it use escape symbols, while major separations is that it is CSV list iterator - and this is how I would name it I implemented string_token_iterator and range_token_iterator for use in Boost.Test. See here (in a couple hours) http://tinyurl.com/28db9 for implementation and here http://tinyurl.com/287uk for test suite. Regards, Gennadiy.