Hi, If you don't want to have container to store the results, you can use the split_iterator directly. split algorithm only wraps the split_iterator. http://www.boost.org/doc/libs/1_35_0/doc/html/boost/algorithm/split_iterator... http://www.boost.org/doc/libs/1_35_0/doc/html/string_algo/usage.html#id12907... Regards, Pavol. Florin Trofin wrote:
Turns out that the char_separator shamelessly constructs std::strings under the cover so I gained something but not as much as I hoped. The split algorithm you mention requires a container to store the results so you still have to do one allocation, correct?
Frustrating! In theory one should be able to parse a sequence of tokens without constructing or copying any strings.
Florin.
On Wed, Mar 26, 2008 at 12:54 AM, Pavol Droba
mailto:droba@topmail.sk> wrote: Hi,
Why don't you just use the split algorithm in the StringAlgo library?
http://www.boost.org/doc/html/string_algo/usage.html#id1638440
Regards, Pavol.
Florin Trofin wrote: > Hi, > > > I've been using the boost tokenizer successfully in the past and I've > been quite happy with it. I was using it with std::string as my token > type, but now I need to use it differently because of performance > reasons (the input string is a raw UTF8 buffer (const unsigned char*) > and output is a specific UTF16 string class). So I thought: maybe I can > just tokenize the unsigned char buffer in place using > boost::iterator_range
as my token type. > > And it almost worked! With a hack: > > the tokenizer attempts to call assign on my TokenType but > boost::iterator_range doesn't have such member function. I created a > wrapper class that simply delegates to the iterator_range's assignment > operator and it now works! > > This is great because I have no more useless string constructions: I can > go directly from a raw UTF8 buffer to my output string type (UTF16 > based) with only one conversion and no extra allocations! I still have > the nice syntax of boost tokenizer and the maximum efficiency! > > I think this solution should be mentioned in the tutorial docs because > it might not be obvious for everybody. Also, maybe we can eliminate the > hack I did by adding an assign() to the boost range interface (this > seems simpler to me than modifying the tokenizer to not call assign). > > Thanks for the great work you guys put into this library! > > > Best regards, > > > Florin. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Boost-users mailing list > Boost-users@lists.boost.org mailto:Boost-users@lists.boost.org > http://lists.boost.org/mailman/listinfo.cgi/boost-users _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org mailto:Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users ------------------------------------------------------------------------
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users