
On 12/28/05, Christopher Kohlhoff <chris@kohlhoff.com> wrote:
Hi Rene,
--- Rene Rivera <grafik.list@redshift-software.com> wrote:
The sender doesn't discard.
I didn't mean that the sender discards, only that the datagram being sent can be discarded immediately by the TCP/IP stack if the receiving socket's buffer is full. The sender is not aware of this.
If it where any other way the test that I wrote would show "missing" receives.
It does indeed exhibit many missing receives on two of my systems (one Linux, the other Mac OS X, both uniprocessor).
I think I'm seeing dropped packets when I run the test on an SMP Linux 2.4-based system (see below). With Linux 2.6, the packets all appear to make it to the server, but we have the sync-faster-than-async condition.
But Calebs results show all messages arriving, as would be
expected from the localhost device.
I suspect that Caleb is using a multiprocessor machine, and so the behaviour I described does not happen for him. However on my systems with "flow control" enabled it is still necessary to include the additional performance optimisations to get the substantial improvement. It would be interesting to rerun the test on a multiprocessor machine with these changes included.
I've run the tests on an SMP machine and a single-CPU box. The SMP machine is running a Redhat 2.4.21 kernel and the test actually doesn't work properly on this platform. It seems that the receiver drops many packets and, because of the way the test is written, the program crashes when trying to print the incomplete result data. I added some debugging print statements that show the async_server receives about 75,000 of the 600,000 packets sent before it is forcefully stopped. When I run the tests on my single-CPU machine running Linux 2.6.14.3 at home, the test completes properly. I've tried both the epoll and select-based reactors and the results are nearly identical (approx. 3x slower than sync). I think its difficult to say where the speed difference comes from on these tests. It may exercise some inefficiencies or limitations of the Linux UDP stack. Perhaps the syscall overhead of using select/epoll is where all the performance is lost. Or it could be the architecture of asio that is at fault. I don't think we can say for sure at this point. I'll see if I can't knock up a straight socket-based test like this one (e.g. just POSIX socket calls, no asio) and see if it gets similar results to Rene's benchmark. The same benchmark run on some other UNIX-based system (MacOS, *BSD, etc) would be another interesting data-point to have to see if this is an implementation or a platform issue. Just a quick back of the envelope calculation. The synchronous test manages to handle about 65,000 1k messages a second, which amounts to a bandwidth of approximately 500 Megabits/second. It would certainly be nice to be able to sustain this level of throughput, but I'm not sure its an entirely realistic workload. -- Caleb Epstein caleb dot epstein at gmail dot com