Re: [Boost-users] Boost tokenizer and range support

26 Mar 2008

      Hi,

Why don't you just use the split algorithm in the StringAlgo library?

http://www.boost.org/doc/html/string_algo/usage.html#id1638440

Regards,
Pavol.

Florin Trofin wrote:
...
Hi,
I've been using the boost tokenizer successfully in the past and I've 
been quite happy with it. I was using it with std::string as my token 
type, but now I need to use it differently because of performance 
reasons (the input string is a raw UTF8 buffer (const unsigned char*) 
and output is a specific UTF16 string class). So I thought: maybe I can 
just tokenize the unsigned char buffer in place using 
boost::iterator_range<const unsigned char*> as my token type.
And it almost worked! With a hack:
the tokenizer attempts to call assign on my TokenType but 
boost::iterator_range doesn't have such member function. I created a 
wrapper class that simply delegates to the iterator_range's assignment 
operator and it now works!
This is great because I have no more useless string constructions: I can 
go directly from a raw UTF8 buffer to my output string type (UTF16 
based) with only one conversion and no extra allocations! I still have 
the nice syntax of boost tokenizer and the maximum efficiency!
I think this solution should be mentioned in the tutorial docs because 
it might not be obvious for everybody. Also, maybe we can eliminate the 
hack I did by adding an assign() to the boost range interface (this 
seems simpler to me than modifying the tokenizer to not call assign).
Thanks for the great work you guys put into this library!
Best regards,
Florin.
------------------------------------------------------------------------
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Re: [Boost-users] Boost tokenizer and range support

Pavol Droba