
Hi Darryl, --- Darryl Green <darryl.green@unitab.com.au> wrote: <snip>
I think the design as described by the concepts section is very good, but it would be better if the implementation offered finer-grained selection of the desired concepts. Perhaps this is just an aesthetic concern, and it would be easy to modify the existing services, but if these really are separate concepts, why are they bundled together? I'm concerned that there may be some lack of flexibility in the implementation/interface actually offered that isn't apparent from the design. The issue of separating out the sync vs async interface has already been raised;
This has been extensively discussed elsewhere, but in summary I can see no compelling reason why there should be a separate low-level sync-only interface, and many advantages to having them combined. However I still intend to rename the demuxer to make its purpose clearer for all sorts of applications.
I would also like to see the ability to expose a reactive rather than proactive interface by using a reactive_stream_socket_service in place of an async_stream_socket_service. The concepts would seem to allow for this.
This is not consistent with the goal of portability. As I mentioned in the "reactor versus proactor" thread, the proactor is a more portable pattern than the reactor. In the longer term I'm not averse to allowing a reactor abstraction to become part of some secondary public interface, where the restricted portability is explicitly noted.
One item that I have difficulty in discerning a clear design/rationale for is the demuxer. The demuxer as implemented is a repository of services (another wooly concept afaiks) and something that provides the run, interrupt and related functions to control the scheduling/dispatching of handlers in reaction to changes in Async_Object state. Is there anything preventing a more policy-based approach to assembling services?
The demuxer is composed of services in a similar way to how std::locale is composed of facets. The demuxer is not necessarily aware of, or coupled to, the service types that it contains. For example, the SSL implementation uses a service that is only added in to the demuxer if SSL is used. The service used by a particular public interface (e.g. basic_stream_socket) is already a policy template parameter. One example program shows how basic_stream_socket can be instantiated with a custom service to perform logging, where this service is in turn implemented using the standard stream_socket_service. There's no reason why, in principle, policy-driven services could not be added to allow further customisation. But these would be inherently non-portable, and so would only be part of some secondary public interface.
I would like to see the aspects of a socket that have nothing to do with reading/writing better separated, as this would make for a more consistent interface for other forms of I/O such as pipes, message queues etc as well as files.
For stream-oriented I/O, this separation is presented in the form of the Stream concept. This concept is implemented by stream_socket, ssl::stream, buffered_read_stream and so on. It would be implemented by a hypothetical pipe class.
A very simple active object template that supports read and write, but that relies on free functions or helper classes for less universal operations such as bind, setting options, etc, used as a base class, allows appropriate (and necessary in my opinion) use of runtime polymorphism.
I have to disagree on runtime polymorphism: compile-time polymorphism should always be the default. Runtime polymorphism can impose additional costs (such as virtual function calls on a fine-grained interface), and eliminates potential optimisations. This adds up considerably if there are many layers of runtime polymorphism involved. Runtime polymorphism can be layered on top of compile-time polymorphism using some sort of wrapper. E.g. for the stream concept: class abstract_stream { ... }; template <typename Stream> class stream : public abstract_stream { ... }; stream<ssl::stream<stream_socket> > s(...); runtime_polymorphic_function(s); I am not certain if this is something that should be included in asio or not, since the appropriate place to introduce runtime polymorphism can vary according to the design of an application. What do you think? I'd be happy to add it if you think it generally useful. <snip>
The intention was for this general idea to support other types of "file descriptor" that don't offer conventional read and write interfaces, and to allow other interfaces (such as proactive) to use the same demuxing etc infrastructure.
I had hoped to have time to investigate introducing asio as a replacement/alternative to this library, but the timing hasn't been gode for that. In any case, I don't think asio would be a good fit for a system like the following (where the other lib is being used):
Components support tcp and ssl for WAN connections and unix domain sockets locally. Throw in some weird stuff like communicating over special purpose point-to-point links implemented as character devices. There are large chunks of code that have no need to care what exact type of "socket" they are using. Some of this code is actually in dynamically linked libraries itself. There are multiple processes using these libraries.
Short of placing a wrapper around asio to make it more amenable to separate compilation and to introduce runtime polymorphism I can't see how I could use asio effectively this system, or anything similar.
Would the polymorphic wrapper I proposed above address these issues?
As far as I can see asio doesn't even allow interchangeable use of an ssl and a tcp stream.
As I mentioned above, asio addresses this using compile time polymorphism, with the Stream concept being implemented by stream_socket and ssl::stream. For example, both work equally well with free functions like asio::async_read.
I think this is unacceptable in a general purpose library. I'm also concerned about the potential code bloat from unnecessary duplication of code (not from differnt template parameters, but from outright duplication in differnt templates).
Can you please point these out? I'm always in favour of reducing code bloat.
Implementation
The code is clean, however I'm very concerned about the performance (or lack of) and weight that a combination of the implementation and some design decisions have introduced. The documentation mentions the memory usage overhead of the proactor emulation on systems using select/epoll. However it is clear that there is considerable allocation overhead and construction overhead involved (and not only on those systems). While much discussion has centered on allocator performance, I'm concerned that the user/library constructor code invoked for each i/o operation is likely to be non-trivial as well. For bulk transfers, it is likely that these issues won't be too severe, however for systems where low-latency handling of small messages is critical, it is quite possible that these overheads will be comparable to or larger than any other part of the processing for a message/transaction.
I don't deny that there may be room for performance improvement in the implementation. But as I recently tested the custom allocation with Win32 I/O completion ports, and found that it performs almost identically to synchronous I/O, I don't think the public interface imposes any unnecessary inefficiencies.
I find the way the design supports, and the documentation encourages, breaking up handling of structured messages into a small header read followed by a larger read (with appropriate handler for expected body/chunk) while the implementation appears to do some quite heavyweight operations compared to just reading a small header to be almost an encouragement of an antipattern for efficient I/O.
I don't agree that asio favours this approach over others. But I do believe that this approach is perfectly valid and extremely useful, since it permits asynchronous code to be written in terms of "contracts". I.e. perform this operation (or operations) and don't come back to me until it's all done or an error has occurred. For code where absolute efficiency is not required, this technique allows a clearer and more elegant expression of intent. Furthermore, the cost of multiple small read system calls can be mitigated by leveraging the compile-time polymorphism of the Stream concept and replacing: stream_socket my_socket; with: buffered_read_stream<stream_socket> my_socket;
I would expect an optimistic reactive approach (read the header when notified, determine size, handler etc and opertunistically attempt a read of the body) to be significantly faster.
Let's take your example of a fixed sized header followed by a body, and compare and contrast a reactive approach with the alternatives offered by asio. --------------------- 1. Reactive. Here you might have a single handler that is informed when a socket is ready for reading. In this handler you perform non-blocking reads. void handle_read(...) { switch (state_) { case reading_header: // Attempt non-blocking read of header. size_t bytes_read = read(sock_, header_buf_ + header_bytes_so_far_, header_length - header_bytes_so_far_); header_bytes_so_far_ += bytes_read; // Check if we have complete header. if (header_bytes_so_far_ < header_length) return; // Keep waiting for more. // Got the whole header now. body_length_ = decode_body_length(header_buf_); body_buf_ = ...; // size body buffer appropriately body_bytes_so_far_ = 0; state_ = reading_body; // Fall through to make opportunistic read of body. case reading_body: // Attempt non-blocking read of body. bytes_read = read(sock_, body_buf_ + body_bytes_so_far_, body_length_ - body_bytes_so_far_); body_bytes_so_far_ += bytes_read; // Check if we have complete body. if (body_bytes_so_far_ < body_length_) return; // Keep waiting for more. // Got whole body now. notify_app(...); // Start over. state_ = reading_header; } } 2. Asio with multiple reads. This is the approach where we issue asynchronous reads for exactly the length of the header, and then the body. It may be less efficient, but only because of the additional pass through the demultiplexer, not because it makes more read() system calls than the above. However feel free to compare in terms of readability :) The main reason it's clearer is that you no longer have to deal with short reads -- async_read's contract takes care of that for you. void start_read() { // Issue read for entire header. async_read(sock_, buffer(header_buf_, header_length), handle_read_header); } void handle_read_header(const error& e) { if (!e) { // Got header, determine body length. body_length_ = decode_body_length(header_buf_); body_buf_ = ...; // size body buffer appropriately // Issue read for entire body. async_read(sock_, buffer(body_buf_, body_length_), handle_read_body); } } void handle_read_body(const error& e) { if (!e) { // Got whole body now. notify_app(...); // Start over. start_read(); } } 3. Asio with multiple reads + opportunistic body read. This extends number 2 by adding in the opportunistic read of the body. In effect this makes it equivalent to your reactive read approach, but I think you'll still find it's easier to follow -- again because most of the code to handle short reads is eliminated. void start_read() { // Issue read for entire header. async_read(sock_, buffer(header_buf_, header_length), handle_read_header); } void handle_read_header(const error& e) { if (!e) { // Got header, determine body length. body_length_ = decode_body_length(header_buf_); body_buf_ = ...; // size body buffer appropriately // Make opportunistic non-blocking read of body. error read_error; size_t bytes_read = sock_.read_some( buffer(body_buf_, body_length_), assign_error(read_error)); // Did we get whole body? if (bytes_read == body_length_ || read_error) { handle_read_body(read_error); } else { // Issue read for rest of body. async_read(sock_, buffer(body_buf_ + bytes_read, body_length_ - bytes_read), handle_read_body); } } } void handle_read_body(const error& e) { if (!e) { // Got whole body now. notify_app(...); // Start over. start_read(); } } 4. Asio with bulk reads. This approach is exemplified by the HTTP server example. Essentially it reads as much as is available (up to some limit of course) in one go, and then processes it using some state machine similar to that in number 1. It is generally the most efficient since it requires the fewest number of read system calls or passes through the demuxer. void handle_read(const error& e, size_t bytes_read) { if (!e) { process_data(buffer_, bytes_read); async_read(sock_, buffer(buffer_, buffer_length), handle_read); } } 5. Asio with transfer_at_least(...). In certain applications you may be able to combine the opportunistic read into the initial asynchronous read operation, by using the transfer_at_least() completion condition. This executes the operation with a contract that it transfer a minimum number of bytes. In this way you can avoid short read handling for the header, while still maximising the number of bytes transferred in a single operation. void start_read() { // Issue read for entire header, and maybe we'll get some // body as a bonus. boost::array<mutable_buffer, 2> bufs = { buffer(header_buf_, header_length), buffer(body_buf, max_body_length) }; async_read(sock_, bufs, transfer_at_least(header_length), handle_read_header); } ... ---------------------
It is implied that an appropriate demuxer/demuxer service can be used to implement various multi-threaded processing schemes. This may well be true. However, an alternative approach is to simply consider the affinity of an Async_Object and its demuxer(s) (one for read, one for write) to be dynamic and run a number of separate (per thread) demuxers, moving Async_Objects between them as required. In this way the handlers for a given Async_Object will only run in the expected context.
This is not portable. With Win32 I/O completion ports for example, once a socket is associated with an I/O completion port it cannot be disassociated. The demuxer design reflects this.
I imagine other approaches are possible to achieve similar results, but it is not clear (to me) from the design documentation or a quick look at the code that this is the case. I haven't reviewed the code to a level of detail where I can be sure I am understanding the dispatching done by the epoll and select reactors, but it looks to me as though multiple threads may all wait on the same fdset (or equivalent)? Is this right? Intended? I had expected some form of modified leader/followers pattern here?
No, only one thread in the pool waits on select/epoll/kqueue. Any other threads in the pool wait on a condition.
Conclusion
As I mentioned above, had the review been at a different time I would have at least attempted to port some code (even if only test/benchmark code) from an in-house reactor library to asio. As it is my comments are based on a fair bit of experience with network and async (in its most general sense) I/O from the bit-level (literally) up to the application level on a number of OSs/kernels/platforms and libraries (including ACE) and I've spent a few hours reading the code and docs. I have a healthy distrust of my (and everyone elses) guesses regarding performance, and the author and review manager should consider my points above regarding performance as areas for investigation only.
A fair amount of the above appears to be a request for reactor support (which it is, partly) but I would consider mechanisms that support very light-weight proactor use highly desirable irrespective of reactor support. In particular, it would seem that some form of re-use of a single handler and buffer for an Async_Object instance without any allocation/(copy)construction of anything heavier than a pointer should be quite practical.
Asio's interface already supports "lightweight", reusable handlers, since ultimately a handler is just a function object. If you wish, you can implement your handlers so that they are no more expensive than a pointer: class my_handler { public: explicit my_handler(my_class* p) : p_(p) {} void operator()(const asio::error& e) { p_->handle_event(e); } private: my_class* p_; }; <snip> Thanks for taking the time to write such an extensive review. Cheers, Chris