
From: Thore Karlsen <sid@6581.com>
On Fri, 19 Aug 2005 09:08:59 -0400, Rob Stewart <stewart@sig.com> wrote:
The performance problems of requiring vector<char> or char[N] exist on several levels:
- For vector<char>, there is the initialisation of the chars to 0 on construction or when you do a resize. Note that this is proportional to the size of the vector, not necessarily to the amount of data transferred. I have seen this have a noticable cost in a CPU-bound server handling thousands of connections.
Don't construct a vector of a given size or use resize(), then. Rely on reserve() instead.
Then size() will return the wrong value, and relying on capacity() is
No, size() will correctly indicate that there are no objects in the vector.
not a good idea. If you're thinking about pushing data onto the vector
Why is relying on capacity() a bad idea?
as it's read, that's also bad, because then you'd have to read into a temporary buffer first and copy it to the vector after. (Or do multiple resizes.)
At some point, the vector has to have sufficient elements in it so that you can copy new values onto them. Otherwise, the vector won't know it has real elements. Yes, that means you do pay for the initialization at some point and, yes, that isn't desirable.
I think it's a very bad idea to require vector<char> or a static array. Christopher does a good job of explaining the drawbacks, and I agree with him. I also do high performance asynchronous networking in my server and client applications, and a library requiring vector<char> or a static array would be completely useless to me. Most of the time I don't have the data I want to send in a vector or in a static array, and most of the time the amount of data is too big to send or receive a whole buffer at a time.
I understand.
- Requiring a copy from a native data structure into vector<char> or char[N]. If I have an array of a doubles say, I should be able to send it as-is to a peer that has identical architecture and compiler. Avoiding unnecessary data copying is a vital part of implementing high performance protocols.
Agreed. OTOH, using swap(), *if* a user used a vector<double> instead of the array you mention, then vector won't add overhead.
Why would a swap be necessary?
If asio used a vector, the user could swap it's contents into his own vector. (That's not the current interface, but then neither is vector in the current interface. It was just informational for the discussion.)
Instead, how about a std::vector-like class that takes a user-defined, fixed-size block of memory?
No, that would still require a copy if the data isn't already in such a buffer. void * (or unsigned char *, or char *, or whatever) HAS to be there, otherwise the library is useless. Such a class could be an option (and I would like to see it as an option), but not a requirement.
I don't think you understand what I'm suggesting. Notice that I used the word "takes." Furthermore, I think you snipped the details about how that class would use the buffer handed to it. The data has to be in some memory somewhere. The class I'm suggesting can be told to use that memory and can be told how much memory is available. Then, whenever asio needs to read data into the caller's buffer, or write data from the caller's buffer, it is taken from that preexisting buffer.
In my applications I can't afford to copy data from my internal buffers to whatever the networking library requires. I also can't put the data in such buffers to begin with.
So wrap your internal buffers with the class I'm suggesting. No copy is needed. The class simply provides a standard interface, complete with push_back(), iterators, random access, etc., and prevents buffer overruns. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;