Concatenation of many std::string
I'm working on an application with many std::string concatenations, everywhere. This is consuming a lot of CPU resources, due to temporary strings, memory allocation, etc... Due to the size of the existing code, it is not possible to make big changes. I tried first to transform these kind of expressions: std::string a = b + c + d + e ; ... into ... std::string a(b); a += c ; a += d ; a += e ; ... and it is definitively faster, but not enough. I've then worked on a method, just by adding a special object after the '=' sign, which transforms these concatenations into a two pass operation, the first to calculate the total length and allocate the destination string, and the second pass to actually copy the source strings into the destination. One just need to re-write the concatenations this way : std::string a = my_special_object() + b + c + d + e ; But maybe I am reinventing the wheel. Is there in Boost, a way to speed up this kind of operation ? Thanks in advance. -- DSL Komplett von GMX +++ Superg�nstig und stressfrei einsteigen! AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl
On Thu, 2005-02-10 at 18:48 +0100, Chateauneu Remi wrote:
I'm working on an application with many std::string concatenations, everywhere. This is consuming a lot of CPU resources, due to temporary strings, memory allocation, etc... Due to the size of the existing code, it is not possible to make big changes.
I tried first to transform these kind of expressions: std::string a = b + c + d + e ;
... into ...
std::string a(b); a += c ; a += d ; a += e ;
It's a little ugly, but you can do std::string a; a.reserve (b.size () + c.size () + d.size ()); a += b += c += d;
... and it is definitively faster, but not enough.
I've then worked on a method, just by adding a special object after the '=' sign, which transforms these concatenations into a two pass operation, the first to calculate the total length and allocate the destination string, and the second pass to actually copy the source strings into the destination.
One just need to re-write the concatenations this way : std::string a = my_special_object() + b + c + d + e ;
But maybe I am reinventing the wheel. Is there in Boost, a way to speed up this kind of operation ?
I'm not aware of anything in boost - the string algo lib seemed like a natural place but I didn't see anything. I wrote something like that. I'm not sure if it's really safe and correct and all - I believe the temporary lifetime relied on is guaranteed .... You can write: std::string s1; s1 << concat () + a + b + c; or std::string s2 = concat () + a + b + c; I'm not really sure it's worth the obfuscation factor, but if you want to see it, let me know. It's not complicated, but there's probably more than one way to skin the cat. -- t. scott urban <scottu@apptechsys.com>
I've then worked on a method, just by adding a special object after the '=' sign, which transforms these concatenations into a two pass operation, the first to calculate the total length and allocate the destination string, and the second pass to actually copy the source strings into the destination.
One just need to re-write the concatenations this way : std::string a = my_special_object() + b + c + d + e ;
But maybe I am reinventing the wheel. Is there in Boost, a way to speed up this kind of operation ?
I'm not aware of anything in boost - the string algo lib seemed like a natural place but I didn't see anything.
I wrote something like that. I'm not sure if it's really safe and correct and all - I believe the temporary lifetime relied on is guaranteed ....
You can write:
std::string s1; s1 << concat () + a + b + c;
or
std::string s2 = concat () + a + b + c;
Hmmm, this sounds interesting to me. I have to admit that's an idea I've never thought of before. Am I right to assume that the temporary concat object holds some kind of a container-of-pointers to the concatenated strings? And this container will be pushed_back with every operator+? And finally, assignment of this to string will only then calulate the resulted string? I guess the trick is how to avoid memory alllocation while push_backing. One can use a regular by-value fixed-size c-array of pointers as a concat class member, but that complicates the code on what-to-do-when-it's-full. Any other ideas? Just a few thoughts... Yuval
On Thu, 2005-02-10 at 18:48 +0100, Chateauneu Remi wrote:
std::string a = my_special_object() + b + c + d + e ;
std::string s1; s1 << concat () + a + b + c; std::string s2 = concat () + a + b + c;
If you want to see it, let me know. Yes, thanks ! If you wish, I can send mine (330 lines, 11 k-bytes). Could we take the best of both, and maybe put it in the string algo lib ?
-- DSL Komplett von GMX +++ Superg�nstig und stressfrei einsteigen! AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl
Jim, I don't show a fatcat method in my stl. I am on Windows XP with VS 2003. is this part of the standard string or boost library? _____ From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Jim Lear Sent: Friday, February 11, 2005 10:37 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Concatenation of many std::string I know next to nothing about C++, STL, so forgive my ignorance. But, couldn't this performance problem be solved by calling a method that concatenates multiple std::strings? E.g. a.fastcat(b, c, d, e); // returns &a Maybe this is naive, but does every problem require an operator? Chateauneu Remi wrote: On Thu, 2005-02-10 at 18:48 +0100, Chateauneu Remi wrote: std::string a = my_special_object() + b + c + d + e ; std::string s1; s1 << concat () + a + b + c; std::string s2 = concat () + a + b + c; If you want to see it, let me know. Yes, thanks ! If you wish, I can send mine (330 lines, 11 k-bytes). Could we take the best of both, and maybe put it in the string algo lib ? -- Jim Lear (512) 228-5532 (work) (512) 293-7248 (cell)
I see. _____ From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Jim Lear Sent: Friday, February 11, 2005 11:39 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Concatenation of many std::string No, fastcat does not exist. I'm suggesting that it may be easier for Chateauneu to create an optimized method (fastcat or such) that accepts multiple operands rather than wrestling with operators that accept only two operands. Nicholas Cardi wrote: Jim, I don't show a fatcat method in my stl. I am on Windows XP with VS 2003. is this part of the standard string or boost library? _____ From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Jim Lear Sent: Friday, February 11, 2005 10:37 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Concatenation of many std::string I know next to nothing about C++, STL, so forgive my ignorance. But, couldn't this performance problem be solved by calling a method that concatenates multiple std::strings? E.g. a.fastcat(b, c, d, e); // returns &a Maybe this is naive, but does every problem require an operator? Chateauneu Remi wrote: On Thu, 2005-02-10 at 18:48 +0100, Chateauneu Remi wrote: std::string a = my_special_object() + b + c + d + e ; std::string s1; s1 << concat () + a + b + c; std::string s2 = concat () + a + b + c; If you want to see it, let me know. Yes, thanks ! If you wish, I can send mine (330 lines, 11 k-bytes). Could we take the best of both, and maybe put it in the string algo lib ? -- Jim Lear (512) 228-5532 (work) (512) 293-7248 (cell) _____ _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users -- Jim Lear (512) 228-5532 (work) (512) 293-7248 (cell) This document/email may contain confidential, proprietary or privileged information and is intended only for use of the addressee/recipient named above. No confidentiality or privilege is waived or lost by any error in transmission of this document/email. If you are not the intended recipient of this document/email of Legerity, Inc., you are hereby notified that you must not use, disseminate, or copy it in any form or take any action in reliance on it. If you have received this document/email in error, please immediately notify me at (512) 228-5760 and permanently destroy/delete the original document/email and any copy of the document/email and printout thereof.
On Fri, 2005-02-11 at 10:39 -0600, Jim Lear wrote:
No, fastcat does not exist. I'm suggesting that it may be easier for Chateauneu to create an optimized method (fastcat or such) that accepts multiple operands rather than wrestling with operators that accept only two operands.
You're probably right. You could do something like. using std::string void fastcat (string & dest, const string & s1); void fastcat (string & dest, const string & s1, const string & s2); etc Explicit unrolling of the loops in my toy concat class. Unless you only have a few cases you care about, this is tedious. It's probably to use some template machinery and or preprocessor magic to create a form that takes an arbitrary (up to some limit) number of strings to concatenate. Another option is to rely on the compilers intimate knowledge of it's own library implementations to do all this for you. Don't know of real compilers do this, but it seems like a reasonable request. -- t. scott urban <scottu@apptechsys.com>
On Fri, 11 Feb 2005 13:14:11 -0600, Jim Lear <jim.lear@legerity.com> wrote:
So, I assume the '...' paramater, a la printf, is all but forbidden. It's too bad there no good way to pass a variable number of arguments in C++. Of course one could pass a vector of std::string references, but I'm beyond my feeble abilities to understand the performance affects of constructing a temporary vector.
t. scott urban wrote: On Fri, 2005-02-11 at 10:39 -0600, Jim Lear wrote:
No, fastcat does not exist. I'm suggesting that it may be easier for Chateauneu to create an optimized method (fastcat or such) that accepts multiple operands rather than wrestling with operators that accept only two operands. You're probably right. You could do something like.
using
std::string void fastcat (string & dest, const string & s1); void fastcat (string & dest, const string & s1, const string & s2); etc
Explicit
unrolling of the loops in my toy concat class. Unless you only have a few cases you care about, this is tedious.
It's probably to use some template
machinery and or preprocessor magic to create a form that takes an arbitrary (up to some limit) number of strings to concatenate.
Another option is to
rely on the compilers intimate knowledge of it's own library implementations to do all this for you. Don't know of real compilers do this, but it seems like a reasonable request.
-- Jim Lear (512) 228-5532 (work) (512) 293-7248 (cell)
You could try a signature like this: std::string fastcat(std::string s0, std::string s1 = std::string(), /* and as many strings as you want or can be bothered to put in */, std::string sn = std::string()) Empty strings should be relatively cheap to construct (no heap?), and you get a simulation of a variable number of arguments. Stuart Dootson
On Fri, 2005-02-11 at 11:22 +0100, Chateauneu Remi wrote:
On Thu, 2005-02-10 at 18:48 +0100, Chateauneu Remi wrote:
std::string a = my_special_object() + b + c + d + e ;
std::string s1; s1 << concat () + a + b + c; std::string s2 = concat () + a + b + c;
If you want to see it, let me know. Yes, thanks ! If you wish, I can send mine (330 lines, 11 k-bytes). Could we take the best of both, and maybe put it in the string algo lib ?
Attached. The way I did it is pretty short - 50 lines or so of implementation, but like I said it relies on lifetimes of temporaries and order of operations which I think are guaranteed, but not certain. Also, a quick stress test doesn't show any improvement in speed over the naive method, but that would depend greatly on the lifetime of the destination string, size and number of strings you're concatenating, the behavior of your implementation (e.g. mine penalizes multi-threaded std::string usage heavily), etc. Perhaps you have a way to solve it in a more effective manner. In case I haven't put enough caveats on this, this is just toy code, I haven't and wouldn't put this in production code without more testing, profiling, etc. Regards -- t. scott urban <scottu@apptechsys.com>
Hi, just an idea. What about: s1 << fastcat() + a + b + c; where: class fastcat { vector<std::string *> list; size_t len = 0; friend concat operator+ (const concat &, const std::string &s) { list.push_back(&s); len += s.size(); } operator std::string() { std::string s; s.reserve (len); for (int I = 0; I <list.size(); i++) s+=list[i]; return s; } I didn't test it. Martin -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of t. scott urban Sent: Friday, February 11, 2005 6:06 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Concatenation of many std::string On Fri, 2005-02-11 at 11:22 +0100, Chateauneu Remi wrote:
On Thu, 2005-02-10 at 18:48 +0100, Chateauneu Remi wrote:
std::string a = my_special_object() + b + c + d + e ;
std::string s1; s1 << concat () + a + b + c; std::string s2 = concat () + a + b + c;
If you want to see it, let me know. Yes, thanks ! If you wish, I can send mine (330 lines, 11 k-bytes). Could we take the best of both, and maybe put it in the string algo lib ?
Attached. The way I did it is pretty short - 50 lines or so of implementation, but like I said it relies on lifetimes of temporaries and order of operations which I think are guaranteed, but not certain. Also, a quick stress test doesn't show any improvement in speed over the naive method, but that would depend greatly on the lifetime of the destination string, size and number of strings you're concatenating, the behavior of your implementation (e.g. mine penalizes multi-threaded std::string usage heavily), etc. Perhaps you have a way to solve it in a more effective manner. In case I haven't put enough caveats on this, this is just toy code, I haven't and wouldn't put this in production code without more testing, profiling, etc. Regards -- t. scott urban <scottu@apptechsys.com>
For me this kind of thing belongs to compiler optimization. In Java javac does that by using StringBuffer instead of String for these concatenations. Does anybody knows if the most used C++ compilers (gcc, borland, ms, intel, etc) do such optimization ? Thanks, Mauricio Gomes Pensar Digital phone: 55-11-4121-6287 mobile: 55-11-8319-9610 http://pensardigital.com On Feb 11, 2005, at 4:52 PM, Martin Dluho¹ wrote:
Hi, just an idea. What about:
s1 << fastcat() + a + b + c;
where:
class fastcat { vector<std::string *> list; size_t len = 0;
friend concat operator+ (const concat &, const std::string &s) { list.push_back(&s); len += s.size(); }
operator std::string() { std::string s;
s.reserve (len);
for (int I = 0; I <list.size(); i++) s+=list[i];
return s; }
I didn't test it.
Martin
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of t. scott urban Sent: Friday, February 11, 2005 6:06 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Concatenation of many std::string
On Fri, 2005-02-11 at 11:22 +0100, Chateauneu Remi wrote:
On Thu, 2005-02-10 at 18:48 +0100, Chateauneu Remi wrote:
std::string a = my_special_object() + b + c + d + e ;
std::string s1; s1 << concat () + a + b + c; std::string s2 = concat () + a + b + c;
If you want to see it, let me know. Yes, thanks ! If you wish, I can send mine (330 lines, 11 k-bytes). Could we take the best of both, and maybe put it in the string algo lib ?
Attached. The way I did it is pretty short - 50 lines or so of implementation, but like I said it relies on lifetimes of temporaries and order of operations which I think are guaranteed, but not certain. Also, a quick stress test doesn't show any improvement in speed over the naive method, but that would depend greatly on the lifetime of the destination string, size and number of strings you're concatenating, the behavior of your implementation (e.g. mine penalizes multi-threaded std::string usage heavily), etc.
Perhaps you have a way to solve it in a more effective manner.
In case I haven't put enough caveats on this, this is just toy code, I haven't and wouldn't put this in production code without more testing, profiling, etc.
Regards
-- t. scott urban <scottu@apptechsys.com>
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Thu, 10 Feb 2005 18:47:11 -0800, t. scott urban <scottu@apptechsys.com> wrote:
On Thu, 2005-02-10 at 18:48 +0100, Chateauneu Remi wrote:
I'm working on an application with many std::string concatenations, everywhere. This is consuming a lot of CPU resources, due to temporary strings, memory allocation, etc... Due to the size of the existing code, it is not possible to make big changes.
I tried first to transform these kind of expressions: std::string a = b + c + d + e ;
... into ...
std::string a(b); a += c ; a += d ; a += e ;
It's a little ugly, but you can do
std::string a; a.reserve (b.size () + c.size () + d.size ()); a += b += c += d;
That's about as fast as you can do. If stl library authors used expression templates you do this just as fast with a = b + c + d + e; See round about here http://lists.boost.org/MailArchives/boost/msg55339.php and http://lists.boost.org/MailArchives/boost/msg55341.php and an alternative though which may be better http://lists.boost.org/MailArchives/boost/msg55413.php Doesn't help you now though, sorry. Matt. matthurd@acm.org
On Sat, 2005-02-12 at 06:29 +1100, Matt Hurd wrote:
On Thu, 10 Feb 2005 18:47:11 -0800, t. scott urban <scottu@apptechsys.com> wrote:
On Thu, 2005-02-10 at 18:48 +0100, Chateauneu Remi wrote:
I'm working on an application with many std::string concatenations, everywhere. This is consuming a lot of CPU resources, due to temporary strings, memory allocation, etc... Due to the size of the existing code, it is not possible to make big changes.
I tried first to transform these kind of expressions: std::string a = b + c + d + e ;
... into ...
std::string a(b); a += c ; a += d ; a += e ;
It's a little ugly, but you can do
std::string a; a.reserve (b.size () + c.size () + d.size ()); a += b += c += d;
That's about as fast as you can do.
Yep, I borked the last line though: a += b; a += c; a+= d;
If stl library authors used expression templates you do this just as fast with a = b + c + d + e; See round about here http://lists.boost.org/MailArchives/boost/msg55339.php and http://lists.boost.org/MailArchives/boost/msg55341.php and an alternative though which may be better http://lists.boost.org/MailArchives/boost/msg55413.php
interesting. -- t. scott urban <scottu@apptechsys.com>
participants (9)
-
Chateauneu Remi
-
Jim Lear
-
Martin Dluho�
-
Matt Hurd
-
Mauricio Gomes
-
Nicholas Cardi
-
Stuart Dootson
-
t. scott urban
-
Yuval Ronen