Victor, my humble tokenizer:
string input_to_be_tokenized;
istringstream ss;
string s;
deque<string> tokens;
while (input_to_be_tokenized >>
ss)
tokens.push_back(ss);
I made 3 test programs:
string inp;
for (int i = 0; i < 10000000;
++i)
inp += " a";
// generate a container of tokens from inp with one
of three methods:
// my method (see
above)
// Victor's
method
// Boost, using a separator of ("
\n\t")
// if desired, loop over all the tokens using
operator[] for the two deques, and the iterator for Boost's
container
Then I compiled them (gcc 3.4.2, Fedora Core 3,
i386):
g++ -pg progN progN.cc
Ran them without the final loop and saved 'gmon.out' as a
unique name.
Ran them again with the final loop and saved 'gmon.out' as
a unique name.
Ran all six gmon's and saved the outputs to unique
files:
gprof progN gmonX > X.prof
The accumulated times (sec) are
surprising:
Boost Victor's Mine
=====
===== ====
no loop
1.50 38.84
20.13
loop
131.91 38.89 23.91
Granted, I didn't do the tests multiple times, but it seems
to me that the Boost tokenizer is great if you don't need to iterate through it,
but it is the pits if you do.
-Tom
I'll send you my code and results if
interested.
-Tom
prog
Ran then a
Ran them with gprof
gprof progN
Ran
generate a string with 10,000,000 tokens (" a")_: "
a a a ....a" and timed your tokenizer against mine 10 times. Mine beat
yours by 2 to 3 seconds every time.
The I used the Boost tokenizer and the timings went WAY
down.
So I think the benefits og the Boost tokenizer are well
worth it, even for trivial tokenizing.
-Tom
my tokenizer
At 19:19 2005-06-11, you wrote:
> for tokenizing on
whitespace, simple stream input (>>) to a
> std::string
suffices.
My own tokenizer does just that--and puts the tokens into a
deque.
> IMO, it's hardly worth troubling yourself with a
tokenizer
> for whitespace.
Well, not really. When
parsing line-oriented output and semi-known
structured lines it's handy
to be able to sometimes work with a line's
tokens as if they were in a
vector or deque.
string
yourline;
istringstream
is( yourline );
deque
< string > yourvec(( istream_iterator < std :: string >( is )), istream_iterator <std :: string >());
voila, a deque
it would be
interesting to profile that against the hypothetical indexable
tokenizer.
In fact, I was going to add a
suggestion that the tokenizer also have the []
operator so that the
individual tokens could be addressed as tok[1],
etc.
-Tom
_______________________________________________
Boost-users
mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Victor A. Wagner Jr. http://rudbek.com
The five
most dangerous words in the English
language:
"There oughta be a law"