[boost] Re: [network] An RFC - updated

22 Apr 2005

      Hi Peter,
...
I like the proposal. It shows a certain polish that only
comes with real-world experience with a design. ;-)
Thanks!
...
I have the following comments for now, based on a
not-very-deep look at the document and the header:
Thanks for all the feedback.
...
1. network root class
In my opinion, this class is not needed and should be
removed. It complicates the design, forces a global
variable on the user, and is not library-friendly. For
example, if lib1 and lib2 use the network, the user 
can't pass an address created by lib1 to lib2. The
global state modeled by the network class should be an
implementation detail.
In many of my uses, I have multiple network-derived types in use at
the same time (serial, HTTP tunnel, TCP/IP). I agree that it can be
an issue, so perhaps there should be a way to associate code with a
particular kind of network and not need the network object. Having
said that, though, I think this assumed context should only be a
default to be used when the calling code has no network object in
hand. All libraries should accept and pass along the context.
Otherwise, one could not reuse protocols over different networks in
the same app.

While this can be seen as an issue, most of the time I find that I
need to pass something along already: an address object or a stream,
for example. That is sufficient to get back to the network object. It
is also OK to create multiple network instances of the same type,
though it should not be done with impunity<g>.
...
2. address
The address should be a standalone class and not tied
to a particular network.
I think different networks (in my uses anyway) have different
expectations (data size, textual form, etc.). This lead me to the
conclusion that an address is an abstract idea. Its primary role is a
factory for other objects related to the address it describes, which
is again not something a concrete type would be able to handle (at
least w/o indirection hidden inside).
...
It should be a simple entity and not a hierarchical
container. Logical to physical resolution should take
a source logical address and produce a container of
physical addresses.
This container is often needed internally and, in my experience,
seldom needed by the user. It is helpful for the logical address
object to know the corresponding physical addresses and not have to
repeat the resolution process. I see no fundamental problem exposing
this collection as std::vector<address_ptr> (or whatever) and not
providing an interface to encapsulate it, but I was trying to keep
things purely abstract.
...
The address should not be created with an URL, that is,
a TCP connection to www.example.com should be represented
by the address tcp:/www.example.com:80, leaving
http://www.example.com reserved for the stream obtained
by the HTTP GET / query, not for the raw protocol stream.
A connection to the serial port should probably be
represented by "com1:/38400,n,8,1", and so on.
This scheme morphing seems to fit with the idea that network
instances are behind the scenes, but creates complexity for the user
as well since these transformations must be done at that level. For
serial use, it is not sufficient to say "com1" (that was a concept I
had early on as well). That would address an entire network, not a
specific "port"-like concept.

I think the "U" in URL is really fitting for this model. There is a
lot of thought behind URL's and they have the right level of
generality. IMHO, we do not need another textual form for addresses.
...
I think that the UDP broadcast address should be
represented by udp:/0.0.0.0, and the TCP loopback
should be tcp:/127.0.0.1, as usual. But I may be
wrong.
All of these are forcing TCP/IP (IPv4 even<g>) concepts to the user.
The central idea of the abstraction I proposed is that "datagram" and
"stream" are the behavior-bundles to which one should write
protocols, not TCP/UDP or sockets. The notion of loopback and
broadcast can carry over from one network type to another, but these
textual forms do not.

That said, this will work given an IPv4 network object in my
proposal:

  strm = net->new_address("127.0.0.1:80")->new_stream();

It is just not portable to IPv6 or any other network type. Hence, one
is better off saying:

  strm = net->new_loopback_address()->new_stream();
...
3. Minor stylistic issues
p->get_X() should be p->X()
Personally, I go back and forth on this<g>. I suppose that
std::basic_string<>::length is the right form to follow. I currently
like the get_X/set_X symmetry, but will change back in the proposal.
...
p->new_Y() should probably be p->create_Y()
Is that a preferred verb for Abstract Factory? I should reread that
chapter... :) I chose "new" because it advertised exactly what must
happen: a new instance is made. Not that I am wedded to it, if there
is ample precedent for "create".
...
although it might be better to move to
net::create_Y( p ) or even to Y( p ) and reference
semantics under the hood.
The approach I took (based on shared_ptr <g>) allowed for clear
ownership, copy and layering semantics. In other words:

   net::stream_ptr stream = ...;

   stream = ssl::new_client_stream(stream);

At this point, the new SSL stream impl drives the original stream
which it can rely upon to exist until it is done with it. From the
user's point of view, the SSL stream is now _the_ stream.
...
The destructors should be protected.
Agreed.
...
4. MT-centric model
The general idea of an asynchronous callback dispatcher
is sound, but the threading decision should be left to
the user. The library should provide
void net::poll( timeout tm );
that delivers any pending callbacks from the context of
the thread that called poll, and
void net::async_poll( error_callback );
that acts "as if" it executes net::poll( infinite )
repeatedly in one or more background threads.
(error_callback will be called on errors; the 
synchronous variant will probably throw an exception
instead.)
The threading choices are certainly the most complex. Having said
that, these are mostly implementation choices, not interface choices
though one might need to extend the interface to support ideas like
the one you propose. The complexity really shows up when one wants to
write protocol libraries. For each choice presented to the network
user, the protocol library probably needs to provide similar support.

For example, an HTTP library would need to provide sync and async
under my proposal. Adding more styles of notification to the network
layer probably makes this job more difficult. Not to pass final
judgment here; just a consideration.

In my work on this, I attacked thread affinity outside the network
layer for the most part. In particular, what you called net::poll() I
had as a different object that handled a queue of
boost::function<>-like objects. That way, the main thread was not
forced to be only a network thread.

I do see room for this approach in cases where the main thread wants
to be a network-only thread and the queue approach feels too heavy. I
think that this would fit in my proposal as a different concrete
network-derived class as long as the abstraction is the same: sync
and async (with other rules TBD).
...
Whether the library uses multiple threads under the
hood, or how many, is implementation defined, even if
async_poll is not called; this depends on which model
delivers the best performance on the specific platform.
While I agree to some extent, the user must know the context in which
callbacks will be made. Or at least, they need to know a set of rules
to follow that will keep their code working for all implementations.

Forcing a single-threaded view of the network on the user imposes its
own penalty. At the top-most level, the programmer should have some
(hopefully small<g>) number of concrete network objects to pick
amongst. Once chosen, all mid-level libraries need to know that their
expectations of behavior are still going to be met. At the bottom, we
do what seems best to implement the abstraction on a given platform.
...
5. read_later, etc
I think that read_later should be dropped. async_read
is enough, in my opinion.
Personally I have little use for non-blocking style, but there is a
lot of desire for it. And it does have its merits for buffering data.
...
The functionality of write_later should be achievable
with a call to async_write with size 0; write_later
can be dropped, too.
I debated this myself, but decided to apply the axiom "don't
parameterize behavior" and ended up with the x_later approach. Under
the covers, they will be very much like async_x with no buffers on
which to operate.
...
async_read should not take a buffer; instead, the
callback should receive a pointer to a buffer managed
by the library that is guaranteed to be valid for the
duration of the callback. (Not by default, at least.)
async_write (by default) should not assume that the
passed buffer stays valid after async_write returns.
Rationale: buffer management is a pain with multiple
callbacks active ;-)
Others I believe have commented on this, but my take is that this
level should not make assumptions about the ultimate destination of
data. This approach, would in many cases, force data to be copied
"one more time". I agree that this can be a painful process, but
perhaps this is where higher level abstractions come into play, or
possibly (as you suggest) optional additional features for this
abstraction. Maybe passing (NULL,0) for I/O requests, or some sort of
set_buffer() mechanism. Don't know.
...
That's it for now. Please don't take these as criticisms
and wherever you see "should" read "in my humble opinion,
the design probably could be enhanced by".
Please understand my responses in the same way. :)

I (try<g>) to never take criticism of designs personally, and I do
feel that all criticism, especially constructively phrased as yours,
is only beneficial to everyone.
...
Thanks for reading.
Nope - thank you for taking the time read my proposal and then
respond with so many ideas!

Best regards,
Don

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

[boost] Re: [network] An RFC - updated

Don G