Fastest way of serializing a huge vector of ints into a human readable string
Hi, I have a program which produces a vector of integers (several million entries). I need to write that into a human-readable string of space separated numbers. I wonder, what would be the fastest way? My first attempt was stringstream resultStream; copy(integers.begin(), integers.end(), ostream_iterator<int>(resultStream, " ")); string result = resultStream.str(); But that requires copying the string. Thus I was wondering if I could use boost to write to the target string directly? boost::iostreams allow me to do the following: string result; boost::iostreams::filtering_ostream out(boost::iostreams::back_inserter(result)); copy(integers.begin(), integers.end(), ostream_iterator<uint64>(out, " ")); This is faster by almost a factor of 2. Any ideas how to increase speed even more? Thanks in advance, Roland
Hm, maybe boost::spirit::karma could do you a favor here, but I think it will not gain much perfomance, if at all.
Maybe you could think about a functor which does it on your own.
Do the numbers repeat often, then you could also build a table
Datum: Fri, 27 Feb 2009 18:46:07 +0100 Von: Roland Bock
An: boost-users@lists.boost.org Betreff: [Boost-users] Fastest way of serializing a huge vector of ints into a human readable string
Hi,
I have a program which produces a vector of integers (several million entries). I need to write that into a human-readable string of space separated numbers.
I wonder, what would be the fastest way?
My first attempt was
stringstream resultStream; copy(integers.begin(), integers.end(), ostream_iterator<int>(resultStream, " ")); string result = resultStream.str();
But that requires copying the string.
Thus I was wondering if I could use boost to write to the target string directly?
boost::iostreams allow me to do the following:
string result; boost::iostreams::filtering_ostream out(boost::iostreams::back_inserter(result)); copy(integers.begin(), integers.end(), ostream_iterator<uint64>(out, " "));
This is faster by almost a factor of 2.
Any ideas how to increase speed even more?
Thanks in advance,
Roland _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01
Jens Weller wrote:
Hm, maybe boost::spirit::karma could do you a favor here, but I think it will not gain much perfomance, if at all.
Maybe you could think about a functor which does it on your own. Do the numbers repeat often, then you could also build a table
to only convert once to string. Also parellizing with boost Thread would be an option.
Hi Jens, hmm. Documentation about boost::spirit::karma seems to be well hidden. Even a google search like karma site:www.boost.org/doc/libs/1_38_0/libs/spirit did not reveal any hits. Can you give me a link? Writing an own functor might be an option, I guess. The numbers do not repeat often, though. Quite the opposite: It is for certain that every number appears just once. And it could be anything within a 64bit range. Parallelization certainly could be helpful in some scenarios, but I a have a multi-threaded application already. So splitting up the serialization might increase speed for the serialization itself, while slowing down the total process due to increased context switches. I therefore want to have the single threaded operation as efficient as possible. Regards, Roland
mfg.
Jens Weller
-------- Original-Nachricht --------
Datum: Fri, 27 Feb 2009 18:46:07 +0100 Von: Roland Bock
An: boost-users@lists.boost.org Betreff: [Boost-users] Fastest way of serializing a huge vector of ints into a human readable string Hi,
I have a program which produces a vector of integers (several million entries). I need to write that into a human-readable string of space separated numbers.
I wonder, what would be the fastest way?
My first attempt was
stringstream resultStream; copy(integers.begin(), integers.end(), ostream_iterator<int>(resultStream, " ")); string result = resultStream.str();
But that requires copying the string.
Thus I was wondering if I could use boost to write to the target string directly?
boost::iostreams allow me to do the following:
string result; boost::iostreams::filtering_ostream out(boost::iostreams::back_inserter(result)); copy(integers.begin(), integers.end(), ostream_iterator<uint64>(out, " "));
This is faster by almost a factor of 2.
Any ideas how to increase speed even more?
Thanks in advance,
Roland _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Hi Roland, I'm not sure if karma speeds up the process at all. Also it runs afaik on an iterator interface, so you'd have the option to control what happens underneath.
Jens Weller wrote:
Hm, maybe boost::spirit::karma could do you a favor here, but I think it will not gain much perfomance, if at all.
Maybe you could think about a functor which does it on your own. Do the numbers repeat often, then you could also build a table
to only convert once to string. Also parellizing with boost Thread would be an option.
Hi Jens,
hmm. Documentation about boost::spirit::karma seems to be well hidden. Even a google search like
karma site:www.boost.org/doc/libs/1_38_0/libs/spirit
did not reveal any hits. Can you give me a link?
Karma seems not to be documented yet well enough in the docs. But there are some examples which are shipped with boost: boost/libs/spirit/examples/karma/ f.e. http://www.boost.org/doc/libs/1_38_0/libs/spirit/example/karma/quick_start1....
Writing an own functor might be an option, I guess. The numbers do not repeat often, though. Quite the opposite: It is for certain that every number appears just once. And it could be anything within a 64bit range.
hm, I ask my self, how much time the conversiont number->string takes. Once you remove all the copying of tempstrings etc. you probably already end up at a pretty fast way.
Parallelization certainly could be helpful in some scenarios, but I a have a multi-threaded application already. So splitting up the serialization might increase speed for the serialization itself, while slowing down the total process due to increased context switches. I therefore want to have the single threaded operation as efficient as possible.
Well, ofcourse you'd have to do the optimisation first. Just using more processors doesn't make it faster in the processing. regards Jens Weller
-------- Original-Nachricht --------
Datum: Fri, 27 Feb 2009 18:46:07 +0100 Von: Roland Bock
An: boost-users@lists.boost.org Betreff: [Boost-users] Fastest way of serializing a huge vector of ints into a human readable string Hi,
I have a program which produces a vector of integers (several million entries). I need to write that into a human-readable string of space separated numbers.
I wonder, what would be the fastest way?
My first attempt was
stringstream resultStream; copy(integers.begin(), integers.end(), ostream_iterator<int>(resultStream, " ")); string result = resultStream.str();
But that requires copying the string.
Thus I was wondering if I could use boost to write to the target string directly?
boost::iostreams allow me to do the following:
string result; boost::iostreams::filtering_ostream out(boost::iostreams::back_inserter(result)); copy(integers.begin(), integers.end(), ostream_iterator<uint64>(out, " "));
This is faster by almost a factor of 2.
Any ideas how to increase speed even more?
Thanks in advance,
Roland _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01
Roland that's not the fastest way with some other reagards, because
stringstream as all iostream classes are designed to deal with thread safe
apps and call lock/unlock pairs for _every_ character inserted. This
operation is done by constructing the ostream::sentry guard. To be more
efficient write your output to stream_buf classes.
try adapting this example to your needs:
http://codepad.org/ujfeLB72
When I did some tests, this brought me performance factor of 3 or 4 over
common iostream.
With Kind Regards,
Ovanes
On Fri, Feb 27, 2009 at 6:46 PM, Roland Bock
Hi,
I have a program which produces a vector of integers (several million entries). I need to write that into a human-readable string of space separated numbers.
I wonder, what would be the fastest way?
My first attempt was
stringstream resultStream; copy(integers.begin(), integers.end(), ostream_iterator<int>(resultStream, " ")); string result = resultStream.str();
But that requires copying the string.
Thus I was wondering if I could use boost to write to the target string directly?
boost::iostreams allow me to do the following:
string result; boost::iostreams::filtering_ostream out(boost::iostreams::back_inserter(result)); copy(integers.begin(), integers.end(), ostream_iterator<uint64>(out, " "));
This is faster by almost a factor of 2.
Any ideas how to increase speed even more?
Thanks in advance,
Roland _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Ovanes, thanks for the explanation and the link. I must admit that I avoided taking closer looks at stream_buf classes, but I will change that now :-) Regards, Roland Ovanes Markarian wrote:
Roland that's not the fastest way with some other reagards, because stringstream as all iostream classes are designed to deal with thread safe apps and call lock/unlock pairs for _every_ character inserted. This operation is done by constructing the ostream::sentry guard. To be more efficient write your output to stream_buf classes.
try adapting this example to your needs: http://codepad.org/ujfeLB72
When I did some tests, this brought me performance factor of 3 or 4 over common iostream.
With Kind Regards, Ovanes
On Fri, Feb 27, 2009 at 6:46 PM, Roland Bock
mailto:rbock@eudoxos.de> wrote: Hi,
I have a program which produces a vector of integers (several million entries). I need to write that into a human-readable string of space separated numbers.
I wonder, what would be the fastest way?
My first attempt was
stringstream resultStream; copy(integers.begin(), integers.end(), ostream_iterator<int>(resultStream, " ")); string result = resultStream.str();
But that requires copying the string.
Thus I was wondering if I could use boost to write to the target string directly?
boost::iostreams allow me to do the following:
string result; boost::iostreams::filtering_ostream out(boost::iostreams::back_inserter(result)); copy(integers.begin(), integers.end(), ostream_iterator<uint64>(out, " "));
This is faster by almost a factor of 2.
Any ideas how to increase speed even more?
Thanks in advance,
Roland _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org mailto:Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
------------------------------------------------------------------------
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Ovanes, I wonder, though, is there a standard way to write integers to a stream_buf? As far as I can see, the ostreambuf_iterator takes character classes as template arguments, only. Regards, Roland Roland Bock wrote:
Ovanes,
thanks for the explanation and the link.
I must admit that I avoided taking closer looks at stream_buf classes, but I will change that now :-)
Regards,
Roland
Ovanes Markarian wrote:
Roland that's not the fastest way with some other reagards, because stringstream as all iostream classes are designed to deal with thread safe apps and call lock/unlock pairs for _every_ character inserted. This operation is done by constructing the ostream::sentry guard. To be more efficient write your output to stream_buf classes.
try adapting this example to your needs: http://codepad.org/ujfeLB72
When I did some tests, this brought me performance factor of 3 or 4 over common iostream.
With Kind Regards, Ovanes
On Fri, Feb 27, 2009 at 6:46 PM, Roland Bock
mailto:rbock@eudoxos.de> wrote: Hi,
I have a program which produces a vector of integers (several million entries). I need to write that into a human-readable string of space separated numbers.
I wonder, what would be the fastest way?
My first attempt was
stringstream resultStream; copy(integers.begin(), integers.end(), ostream_iterator<int>(resultStream, " ")); string result = resultStream.str();
But that requires copying the string.
Thus I was wondering if I could use boost to write to the target string directly?
boost::iostreams allow me to do the following:
string result; boost::iostreams::filtering_ostream out(boost::iostreams::back_inserter(result)); copy(integers.begin(), integers.end(), ostream_iterator<uint64>(out, " "));
This is faster by almost a factor of 2.
Any ideas how to increase speed even more?
Thanks in advance,
Roland _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org mailto:Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
------------------------------------------------------------------------
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Sun, Mar 1, 2009 at 3:36 PM, Roland Bock
Ovanes,
I wonder, though, is there a standard way to write integers to a stream_buf? As far as I can see, the ostreambuf_iterator takes character classes as template arguments, only.
Regards,
Roland
Roland, please take a look at this code: http://codepad.org/UGubU7cw It it always customized numeric converter is almost always faster as writing integers to stringstream with operator<<. Anyway these two tests perform most equally on linux/g++ (but tested very roughly). For Example on MSVC-compiler std::copy was faster as the for-loop. I assume compiler can better optimze in this case. Good Luck, Ovanes
Ovanes, ok, got it! Thanks for the nice examples and your patience. I learned a lot today :-) In the meantime I created my own FormattedBackInserter (see attached code, just a very crude prototype as of now), which allows me to do the job even faster via std::copy. Regards, Roland Ovanes Markarian wrote:
On Sun, Mar 1, 2009 at 3:36 PM, Roland Bock
mailto:rbock@eudoxos.de> wrote: Ovanes,
I wonder, though, is there a standard way to write integers to a stream_buf? As far as I can see, the ostreambuf_iterator takes character classes as template arguments, only.
Regards,
Roland
Roland,
please take a look at this code: http://codepad.org/UGubU7cw
It it always customized numeric converter is almost always faster as writing integers to stringstream with operator<<. Anyway these two tests perform most equally on linux/g++ (but tested very roughly). For Example on MSVC-compiler std::copy was faster as the for-loop. I assume compiler can better optimze in this case.
Good Luck, Ovanes
------------------------------------------------------------------------
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Roland,
your example looks very interesting and is clearly a very intellegent
approach.
Best Regards,
Ovanes
On Sun, Mar 1, 2009 at 7:47 PM, Roland Bock
Ovanes,
ok, got it! Thanks for the nice examples and your patience. I learned a lot today :-)
In the meantime I created my own FormattedBackInserter (see attached code, just a very crude prototype as of now), which allows me to do the job even faster via std::copy.
Regards,
Roland
:-) Thanks and regards from Bavaria, Roland Ovanes Markarian wrote:
Roland, your example looks very interesting and is clearly a very intellegent approach.
Best Regards, Ovanes
On Sun, Mar 1, 2009 at 7:47 PM, Roland Bock
mailto:rbock@eudoxos.de> wrote: Ovanes,
ok, got it! Thanks for the nice examples and your patience. I learned a lot today :-)
In the meantime I created my own FormattedBackInserter (see attached code, just a very crude prototype as of now), which allows me to do the job even faster via std::copy.
Regards,
Roland
------------------------------------------------------------------------
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Anyway these two tests perform most equally on linux/g++ (but tested very roughly).
Actually, one more addition. I had tested the behavior for linux/g++ using codepad, where I posted the example. It is possible that the executable codepad compiled was without thread support and there the sentry has no locks, that's why it was the same as the stream_buffer approach. Regards, Ovanes
participants (3)
-
Jens Weller
-
Ovanes Markarian
-
Roland Bock