
----- Original Message ----- From: "Maxim Yegorushkin" <e-maxim@yandex.ru> To: <boost@lists.boost.org> Sent: Monday, June 13, 2005 11:59 PM Subject: Re: [boost] [Ann] socketstream library 0.7 [snip]
... And too slow. You have one data copy from kernel into the streambuf, and another one from the streambuf to the message object. The same is for output: message -> streambuf -> socket. This is unacceptable, at least for [snip] 30% of user time was spent in guess where? In zeroing out memory in std::vector<char>::resize(). And you are talking about data copying here...
Considering the protocol of your application has built in methods for announcing the length of the payload, your requirement is met by the streambuf::sgetn(char_type*, streamsize) method, for a properly specialized implementation of the virtual protected xsgetn method. [snip]
So you get operator semantics for free. :-) And perhaps even a putback area, if there's one provided by this particular streambuf implementation.
Sounds interesting, but I don't see how this can work well with nonblocking sockets. You have to store how much bytes have already been read/sent somewhere. [snip]
There is really interesting material here. There is also other stuff that I feel obliged to comment on :-) 1. Some of the contortions suggested to effectively read messages off an iostream socket are not specific to the fundamental network/socket goals, i..e they are just difficulties associated with iostream-ing. 2. Some of those same contortions are in response to the different media (not sure if thats the best term) that the stream is running over, i.e. a network transport. This is more ammunition for anyone trying to shoot the sync-iostream-model-over-sockets down. Or at least suggest that the model is a constraint to those wrtiing iostream-based network apps. 3. Lastly, some of the observations (while excellent) seem a bit "macro" when a wider view might lead to different thinking. What I am trying to suggest here is that the time spent in vector<>::resize is truly surprising but its also very low-level. having been through 4 recent refactorings of a network framework, I have been surprised at the gains made in other areas by conceding say, byte-level processing, in another area. To make more of a case around the last point, consider the packetizing, parsing and copying thats recently been discussed. This has been related to the successful recognition of a message on the stream. Is it acknowldeged that a message is an arbitrarily complex data object? By the time an application is making use of the "fields" within a message thats probably a reasonable assumption. So at some point these fields must be "broken out" of the message. Or parsed. Or marshalled. Or serialized. Is the low-level packet (with the length) header and body) being scanned again? What copying is being done? This seems like multi-pass to me. To get to the point; I am currently reading blocks off network connections and presenting them to byte-by-byte lexer/parser routines. These form the structured network messages directly, i.e. fields are already plucked out. So which is better? Direct byte-by-byte conversion to structured network message or multi-pass? Cheers.