Re: [Boost-users] Boost tokenizer and range support

3 Apr 2008

      Hi,

If you don't want to have container to store the results, you can use
the split_iterator directly. split algorithm only wraps the split_iterator.

http://www.boost.org/doc/libs/1_35_0/doc/html/boost/algorithm/split_iterator...
http://www.boost.org/doc/libs/1_35_0/doc/html/string_algo/usage.html#id12907...

Regards,
Pavol.

Florin Trofin wrote:
...
Turns out that the char_separator shamelessly constructs std::strings 
under the cover so I gained something but not as much as I hoped. The 
split algorithm you mention requires a container to store the results so 
you still have to do one allocation, correct?
Frustrating! In theory one should be able to parse a sequence of tokens 
without constructing or copying any strings.
Florin.
On Wed, Mar 26, 2008 at 12:54 AM, Pavol Droba <droba@topmail.sk 
<mailto:droba@topmail.sk>> wrote:
Hi,
Why don't you just use the split algorithm in the StringAlgo library?
http://www.boost.org/doc/html/string_algo/usage.html#id1638440
Regards,
    Pavol.
Florin Trofin wrote:
     > Hi,
     >
     >
     > I've been using the boost tokenizer successfully in the past and I've
     > been quite happy with it. I was using it with std::string as my token
     > type, but now I need to use it differently because of performance
     > reasons (the input string is a raw UTF8 buffer (const unsigned char*)
     > and output is a specific UTF16 string class). So I thought: maybe
    I can
     > just tokenize the unsigned char buffer in place using
     > boost::iterator_range<const unsigned char*> as my token type.
     >
     > And it almost worked! With a hack:
     >
     > the tokenizer attempts to call assign on my TokenType but
     > boost::iterator_range doesn't have such member function. I created a
     > wrapper class that simply delegates to the iterator_range's
    assignment
     > operator and it now works!
     >
     > This is great because I have no more useless string
    constructions: I can
     > go directly from a raw UTF8 buffer to my output string type (UTF16
     > based) with only one conversion and no extra allocations! I still
    have
     > the nice syntax of boost tokenizer and the maximum efficiency!
     >
     > I think this solution should be mentioned in the tutorial docs
    because
     > it might not be obvious for everybody. Also, maybe we can
    eliminate the
     > hack I did by adding an assign() to the boost range interface (this
     > seems simpler to me than modifying the tokenizer to not call assign).
     >
     > Thanks for the great work you guys put into this library!
     >
     >
     > Best regards,
     >
     >
     > Florin.
     >
     >
     >
    ------------------------------------------------------------------------
     >
     > _______________________________________________
     > Boost-users mailing list
     > Boost-users@lists.boost.org <mailto:Boost-users@lists.boost.org>
     > http://lists.boost.org/mailman/listinfo.cgi/boost-users
    _______________________________________________
    Boost-users mailing list
    Boost-users@lists.boost.org <mailto:Boost-users@lists.boost.org>
    http://lists.boost.org/mailman/listinfo.cgi/boost-users
------------------------------------------------------------------------
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users