Re: [boost] [asio] Formal Review

1 Jan 2006

      Christopher Kohlhoff wrote:
...
I believe there might be more similarity there already than you
think, so I'm going to do a bit of experimentation and see what
comes out of it. However the dual support for IPv4 and IPv6
raised by Jeff might be a bit of a problem -- is it something
you address in your network lib?
(looks at old code......)
Yes, more or less, that is, the internal implementation has support for
ipv6, but is not exported in the public interface, but it is a matter of
instantiating a stream_template<inet::ipv6>. Unfortunately i didn't test
it (i have no ipv6 experience), but reading from the Stevens book and
from the SUSv3, it seems that ipv6 sockets are backward compatible with
ipv4 (i.e. ipv6 address resolvers can take ipv4 addresses, and ipv6
sockets can connect and accept ipv4 streams).
I think that the cleaner interface should be to have an ip::stream that
is instantiated as a ipv6::stream if there is system support, or as an
ipv4::stream if there is none.  ipv4::stream and ipv6::stream should be
available if the user explicitly needs them (i.e. no compatibility), but
  the default should be to use the ip::stream.
...
I see what you mean now, however I don't think they can be
portably decoupled. Some platforms will require access to shared
state in order to perform the operation. The acceptor socket
caching you mentioned is also a case for having access to this.
Sometimes you need to make some hard decision. There are any real-life
protocol/platform that need shared state? If yes, than member functions
is fine. If not, or only some theoretical obscure protocol needs it, it
  should not impact the generic interface. For those obscure/rarely used
protocol/platforms a hidden singleton could be used.

Any way, even if you keep the shared state solution, i think that the
proactor should be decoupled from the protocol implementation. Using
your current model, the protocol_service and the proactor_service should
be two different objects. In fact multiple protocols must be able to
reuse the same proactor.
...
I suspect that, if I adopt a type-per-protocol model, the
service will also be associated with the protocol in some way
(i.e. the protocol might be a template parameter of the service
class).
seems good... probably is the same thing i proposed in the previous
paragraph.
...
<snip>
...
Of course non-blocking operations are useful, but as there is
no public readiness notification interface (i.e. a reactor),
its use is somewhat complicated.
What I mean is that the readiness notification isn't required,
since one way of interpreting an asynchronous operation is
"perform this operation when the socket is ready". That is, it
corresponds to the non-blocking operation that you would have
made when notified that a socket was ready. A non-blocking
operation can then be used for an immediate follow-up operation,
if desired.
<snip>
...
What about parametrizing the buffered_stream with a container
type, and providing an accessor to this container?  The
container buffer can then be swap()ed, splice()ed, reset()ed,
fed to algorithms, and much more without any copying, while
still preserving the stream interface. Instead of a buffered
stream you can think of it as a stream adaptor to for
containers. I happily used one in my library, and it really
simplifies code, along with a deque that provides segmented
iterators to the contiguous buffers.
I think a separate stream adapter for containers sounds like a
good plan.
I have to add that my buffered_adapter is actually more than a stream 
adapter for container, because it has an associated stream, and can 
bypass the buffer is the read or write request is big enough. Also the 
adapter has underflow() (you call it fill in your buffered_stream) and 
flush().
...
BTW, is there a safe way to read data directly into a
deque? Or do you mean that the deque contains multiple buffer
objects?
No, there is no portable way to read data in a std::deque<char>. But I 
did not use the std::deque, I've made my own
with segmented iterators and not default construction of pods. Actually
it only work with pods right now, but it is fairly complete, I've have
even added versions of some standard algos with support for segmented 
iterators. It should be not to hard to add non-pod support (not that a
net lib really needs it...).
...
...
i'm already thinking of possible extensions... shared_buffers,
gift_buffers (i need a better name for the last one) and more.
I take it that by "shared_buffers" you mean reference-counted
buffers?
Yes, exactly.
...
If so, one change I'm considering is to add a guarantee that a
copy of the Mutable_Buffers or Const_Buffers object will be made
and kept until an asynchronous operation completes. At the
moment a copy is only kept until it is used (which for Win32 is
when the overlapped I/O operation is started, not when it ends).
Hum, nice, but if you want to forward a buffer another thread, for
example, you want to forward the complete type (to preserve the counter
and the acquire()/release() infrastructure).  I think that it should be
possible to implement some streams that in addition to generic buffer
object can accept specialized per stream buffer types and guarantee
special treatment for them.  For example if write() is called for
inprocess shared memory stream, will copy the buffer if a generic buffer
is passed, but will give special treatment if a special shared buffer is
passed.

I've in mind only shared memory streams for now, but think about a
network transport implemented completely in userspace (with direct
access to the net card): it is theoretically possible to DMA directly
from user buffers to the card buffer, but it might only be possible from
some specially aligned memory. Case in point: the linux aio 
implementation requires that a file is opened in O_DIRECT mode. In turn 
O_DIRECT requires that the supplied buffer is aligned to 512 byte 
boundaries (or filesystem block size for 2.4).  This means that a an 
asio based asynch disk io subsystem for asio would require its buffers 
be specially allocated (or fall back to do an extra copy).  This 
requirement can easily be met if an hipotethical asio::fs_stream has a 
direct_buffer_allocator typedef.  The allocator would return objects of 
type direct_buffer, and fs_stream.async_{read|write}_some would be 
overloaded to explicitly support these buffers.  If a direct_buffer is 
used, fs_stream will use the native linux aio. If a generic_buffer is 
not used, fs_stream should *not* use the linux aio, not even with an 
internal properly allocated bounce buffer, because direct bypasses 
system caches, so it should be done only if the user explicitly request 
it by using direct_buffers. The fall back should probably use worker 
threads.

Btw, future versions of linux aio will almost certainly support 
non-direct asynch io. Still the O_DIRECT mode will probably be fast-pathed.

In the end this boils down to passing the exact buffer type to the lower 
levels of asio implementation.
...
However, this may make using a std::vector<> or std::list<> of
buffers too inefficient, since a copy must be made of the entire
vector or list object. I will have to do some measurements
before making a decision, but it may be that supporting
reference-counted buffers is a compelling enough reason.
Usually the vector is small, and probably boost::array is a best fit. In
  the last case, the buffer is  cached (as it is in the stack), is very
small (less that 20 elements) and it takes very little time to copy it.
In case of vectors, if move semantics are available (because they are
emulated by the standard library, like the next libstdc++ does, or
because of future language development), no copy is needed.
...
...
Btw, did you consider my proposal for multishot calls?
I have now :)
I think that what amounts to the same thing can be implemented
as an adapter on top of the existing classes. It would work in
conjunction with the new custom memory allocation interface to
reuse the same memory. In a way it would be like a simplified
interface to custom memory allocation, specifically for
recurring operations. I'll add it to my list of things to
investigate.
I don't think it is worth to do it at higher levels. Multishot calls are
inconvenient because you lose the guarantee one call -> one callback. I
proposed to add them because they can open many optimization
opportunities at lower levels (reduced system calls to allocate the
callback, may be better cache locality of callback data and less
syscalls to register readiness notification interest).

Ah, btw, happy new year :)

---

Giovanni P. Deretta
---