[boost] Re: [network] An RFC - updated

23 Apr 2005

      Hi Peter,
...
The network root issue is at, hm, the root of our
disagreement.
Indeed :)
...
...
In many of my uses, I have multiple network-derived
types in use at the same time (serial, HTTP tunnel,
TCP/IP).
The question is why do you need multiple network
objects at the client side.
My sad story: Over the past couple years I've had the joy of writing
some ActiveX controls that run in IE (please don't shoot; they were
the good kind<g>). In doing so, I have been best served by not having
any active stuff running behind the scenes, especially threads. The
desire to keep a library of pure objects has flavored my choices
here.

When an ActiveX object dtor is called, I want _all_ activity related
to that instance to stop. Other instances may be held by other
threads and I don't want all interfaces to be thread-safe. Anyway,
that is a big part of why I avoid designs that have significant
apparatus globally managed.
...
Under the address-centric model, the form of the
address determines the network to use. The library
is free to maintain several network objects under
the hood, ...
There is the question of what network types does an application need.
It is not likely they need all of them. So, the app must pre-register
the types it wants and only those address forms will work.

I hope I said this, but perhaps I didn't<g>: I think an
address-to-network mapping is reasonable. I just don't want that to
be the only way to fly. Given that, I suppose one could use
"tcp:http://www.boost.org" in the spirit of URL: everything to the
right of ":" is the scheme-specific-part. :)
...
In general, data-driven designs are much more flexible.
I can read
server.address=tcp4:/www.example.com:5757
from a configuration file and connect. The line might
have been
server.address=com1:/57600,n,8,1
but my application doesn't care, as long as someone
at the other end responds according to the appropriate
server protocol.
Repeat from above: I do see merit in the general approach.
...
Under the explicit network model, I'll need to duplicate
the scheme-to-network logic myself. Not that a
map<string, network_ptr> is that much work; there's just
no benefit.
For the record, I've never needed such a map. There are so few places
that accept or connect, and then, so few different network objects,
that it just hasn't happened. For example:

   void tcp_tab_page::on_click_go ()
   {
      net::address_ptr addr = tcp_->new_address(url);
      session_mgr->start(addr);
   }

The method existed in a "TCP" page of a tabbed dialog, which of
course knows that connections proceed over TCP. It created an address
object and called into network-agnostic code to proceed.
...
The "self contained address as a string" model also has
another advantage: it allows you to take a legacy design
such as std::fopen and enhance it with network capability
in a backward-compatible way.
While I still agree with the jist of your argument, I am not so sure
everyone would appreciate their apps growing by sizeof(http_lib) +
sizeof(ftp_lib) + sizeof(uucp_lib) + sizeof(gopher_lib) + ... to
support this approach. :)
...
I'm not sure I understand. A communication port is a
port, not a network. I've recently dealt with one and
it's very much like a socket. :-)
You are quite right. I was not giving you my context here. In my use
of serial communications, we did provide an entire network over that
line including muxing streams, emulating datagrams, etc.. Underneath
all that was the true nature of the serial line.

So, at one level, the serial line is just a stream. Over that stream,
one can layer an entire network model, including "me" and "thee"
addressing. :)
...
The form above is a valid URI/URL, by the way. The
single slash means that the text after the scheme is
application-dependent and does not follow the
host:port/path?query#anchor format.
Yes, see my comment above.
...
Why I prefer tcp:/host:port instead of scheme://host?
Let's get back to enhancing std::fopen in a backward-
compatible way. I'd expect fopen on
http://www.example.com to return the data stream
obtained by the corresponding GET query, for obvious
reasons.
I'm not sure I understand the difference you are drawing between ":/"
and "://". The way I read the URL/I syntax, many options will work.
Here is one that would be unambiguous 

 - ipv4:tcp:http://www.boost.org
     ipv4 denotes network choice; scheme-specifics
     follow:
       tcp denotes stream as opposed to datagram,
        and again, details follow
       http denotes how to talk say vs. HTTPS or FTP

  - ipv4:http://www.boost.org:80
    tcp can be assumed as long as protocol doesn't
    have udp as well (like echo or discard<g>).
...
...
...
I think that the UDP broadcast address should be
represented by udp:/0.0.0.0, and the TCP loopback
should be tcp:/127.0.0.1, as usual. But I may be
wrong.
All of these are forcing TCP/IP (IPv4 even<g>)
concepts to the user. The central idea of the
abstraction I proposed is that "datagram" and
"stream" are the behavior-bundles to which one
should write protocols, not TCP/UDP or sockets.
The notion of loopback and broadcast can carry
over from one network type to another, but these
textual forms do not.
Yes; another manifestation of the network issue. I
agree that in a network-centric design your approach
is preferable. In an address-centric design, the
TCP/IP4 broadcast address is not portable between
networks.
Indeed, my design is network-centric. Its goal is to provide a
complete (enough<g>) encapsulation of network behaviors and features
that one can write higher level concepts or protocol cleanly. I don't
want to redefine the semantics of networking. The programmer using
this layer is "network programming on purpose". Going up the food
chain beyond this level is good and expected. :)
...
One consistent scheme that I follow is that
functions that do not have side effects and
just return something are called "something()",
and functions that do something are verbs,
like create_stream.
Sounds reasonable. Using this approach, I constantly stumble on this
kind of thing (just did today in fact<g>):

   void hierarchy_object::foo ()
   {
       hierarchy_object * parent = parent(); // oops!
   }

This can be avoided by using get_parent() which is where I have gone
since writing good 'ol hierarchy_object. However, my audience here is
different and I will be assimila..., er, adapt. ;)
...
Your example above is a good argument in favor of
create_stream( address )
for consistency with
create_ssl_stream( stream )
(or however it ends up being called.)
Except that one might then expect create_stream(stream) to be valid,
which it is not. :)
...
An OpenSSL stream would make a terrific example,
by the way. I've recently dealt with one of these,
too. ;-)
This kind of substitutability is the essence of what I am working for
in this design and OpenSSL is part of my plan, but it is a bit higher
level than were I am currently. ;)
...
I'm not sure. net::poll is specifically intended to
preserve the internal structure of your library. You
only need to defer dispatching the callbacks until
net::poll is called (unless net::async_poll is in
effect.)
I am not sure I understand your async_poll suggestion (sorry about
that<g>). I agree that net::poll() would fit with a common desire to
have single threaded network programs, but might complicate things
where the program is a GUI. Integrating with the GUI loop is a study
in compromise (at least for Windows).
...
It also gives you the freedom to make net::poll the
primary model since it maps very naturally to
select/epoll. But you aren't forced to do that.
I like the idea of net::poll() for some uses, but it doesn't fit what
I often need (GUI integration). It does fit with select/epoll
especially on platforms where the number of objects that can go in an
fd_set is > 64.

I don't know if it was clear from previous posts, but I do have in
mind a higher level net::poll() like library. The reason I prefer
that approach to net::poll() is that not all async activities are
network related. A timer comes to mind here. Also, the GUI event
loop. Also, just plain "do this next, but not now" queuing. I would
want the ability to have all that deliver out of the same pump as
network completion callbacks.

My current proposal does not include explicit support for this. With
the general async facility I am describing, net::poll() would be
relegated to only some pure network-only, apps that absolutely insist
on the select w/no threads approach.
...
...
I do see room for this approach in cases where the
main thread wants to be a network-only thread and
the queue approach feels too heavy. I think that
this would fit in my proposal as a different
concrete network-derived class as long as the
abstraction is the same: sync and async (with
other rules TBD).
No, I don't believe that you need another network
class for that. :-)
I agree, but others don't. Some folks don't want any background
threads; just a single thread doing one uber select/epoll call
(again, for platforms that can handle it). The only way I can see to
accommodate that is a different concrete network object. The ones I
would write initially would probably not fit this desire, but again,
the interface and behavior contract probably can.
...
...
While I agree to some extent, the user must know
the context in which callbacks will be made. Or
at least, they need to know a set of rules to
follow that will keep their code working for all
implementations.
That's the whole idea. When using net::poll the
context is the thread that called net::poll. When
using async_poll (your current modus operandi),
the context is an unspecified background thread.
The user is in control of the context.
I am specifically concerned with the hypothetical protocol library
author here more than the application author. If I want to write an
SSL stream, for example, I need to know how my use of the real stream
will behave and what I am allowed to do in the callback context.

While the application layer may make the final call, that call cannot
invalidate the contract assumed by the protocol library author.
...
...
Forcing a single-threaded view of the network on
the user imposes its own penalty. At the top-most
level, the programmer should have some (hopefully
small<g>) number of concrete network objects to
pick amongst. Once chosen, all mid-level libraries
need to know that their expectations of behavior
are still going to be met. At the bottom, we do
what seems best to implement the abstraction on a
given platform.
I don't understand this paragraph, sorry.
Sorry back at you for that confusing paragraph. This is basically the
same concern as above, about protocol libraries.

- At the app layer, the developer sometimes wants to
  choose single threaded(!) vs. don't care but deliver it
  on this thread please vs. use whatever thread context
  is best + use my quad CPU's please +  minimize context
  switches (aka "give me your best shot, I can take it").

- Middle level protocol libraries (like SSL or HTTP)
  must not be presented with different behaviors from
  the abstract interfaces. If they followed the rules
  (TBD), they should continue to work regardless of a
  choice made by the application developer.

- On the bottom, someone gets to (re)implement this
  abstraction in various ways for different platforms
  and/or different run-time models as necessary.
...
Yes, makes sense. I view the problem from the other
side: what are the semantics of an asynchronous
read/write with size 0? Answer: exactly the same as
those of read/write_later.
Or one could answer: illegal. :)
...
Question: why keep *_later and duplicate
functionality then?
Clarity? Grep-ability? Perhaps. It's a bit on the fluffy side<g>, so
I won't loose any sleep either way. My inclination would be to make
read w/size=0 illegal because it might catch an error closer to the
origin and read_later() would be used where that was your goal: not
now, later.
...
Passing NULL is good enough for async_read, but
the same can't be used with async_write to choose
between copy/trust-the-buffer-will-be-there semantics.
True enough.
...
Again, I'm not opposed to manual buffer management.
Glad to (re)hear it. :)
...
It has its uses. However in my experience so far
the buffer management isn't much fun, is error
prone, and in the common case is not optimized to
be more efficient than the automatic case. In some
cases a naive/straightforward manual buffer 
management scheme can be significantly less
efficient.
I would be happy to entertain ideas on how to provide both kinds of
buffer management, especially if there is a way to eliminate as much
cost as possible (in terms of "if checks" and code linkage) when
automatic is never used.

Again, thanks for all the time and thought energy.

Best regards,
Don

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

[boost] Re: [network] An RFC - updated

Don G