Re: [boost] Re: Yet Another Network Library

25 Apr 2005

      Don G wrote:
...
...
- User is not required to reference streams by
pointer, streams are stack allocated or are simply
members of another object. Internally they have a
smart pointer to an implementaion handle. Consider
them stack-based proxies.  The acceptor and the
connector return the handle that is asigned to the
stream.
So is this choice just for user simplification? Internally, the user
is still holding a pointer, right? What are the copy semantics of the
objects held by the user? This is where things can be tricky any way
you go. Either the object is copyable and confusion can come via
aliasing, or they aren't which is probably better in this case, but
could possibly cause some idioms to not work (like "stream s =
my_clever_stream_creator()"). My preference was to use shared_ptr<>
as the semantics are well understood and objects can layer easily in
obvious ways, but the cost is "->" vs "." syntax.
Yes it holds a pointer, a shared_ptr actually, this makes it possible 
for some part of the library to temporarily hold a (potentially weak) 
reference to the handle without fear that it might be destroyed/closed. 
I think this will come handy with asynchronous I/O. [1]

I think that stack semantics are much more intuitive for non polymorphic 
objects (i.e. iostreams versus streambuffers).
Your example could be rewritten as 'my_clever_stream_creator(s)' without 
really losing expressivity in the non-polymorphic case. This is exactly 
how connectors and acceptors work in my library.
Currently the wrapper is copyable, but i will probably correct this 
unless i find very good reasons not to (the only one i can find 
currently is two threads wanting to do parallel i/o on the same file: 
the stream classes are not thread safe, so each thread might want to 
have a copy).

1: Note that the internal file descriptor is closed when and only when 
the owner handle is closed. There is no close() call although shutdown() 
is available.  Thus there is no risk that the operating system might 
reuse the same file descriptor number while there are stale FDs around. 
Usefull if you need a 'FD->handle' map.
...
...
- The preferred way to do input output is to use
standard-like algorithms (i.e. copy) with buffered
stream adaptors and specialized input/output
iterators.  I believe that an efficient library can
be written this way and be very C++-user-friendly.
Classic read/write are still available, but their
semantics might be surprising.
I agree that this is the right approach for many users and protocols,
but most network programmers (including myself<g>) need access to the
primitive behaviors. They won't find them surprising unless the
wrapping violates expectations coming from sockets-like programming.
They are available, in fact they are necessary to implement the rest of 
the library ;-), but they do not try to be user-friendly: they have many 
parameters, complex return values, non trivial preconditions and 
postconditions.  For example there is no guarantee that a write always 
writes the whole buffer in absence of errors, it might do a partial 
write for no reason at all (obviously minimizing the number of calls is 
a quality of implementation issue).
...
...
- All classes are concrete, no polymorphism is used
(i.e. no virtuals). Polimorphic behaviour must
currently be achieved with some external mean (i.e
using the external polymorphism pattern.  I think
that the boost::IDL library would be great).
Here is where we are at different ends of the spectrum :). I didn't
see any SSL code, so I can only imagine how the http code would
handle SSL vs. non-SSL stream underneath. Ideally, this should not
require two template instantiations like http<stream> and
http<ssl_stream>, for example.
In the end, I thought templates had little to offer at this level.
Parameterizing protocols by stream type seems (IMHO) to buy nothing
in particular except the removal of virtual at the expense of the
user having to specify <kind_of_stream> and _lots_ of extra code
generation. The app should be able to layer objects as it sees fit
and run-time polymorphism is (again, IMHO) the right solution to that
problem.
I did start with virtual interfaces based design (Part of it still 
visible, for example the address object is way too much complex for my 
current needs, also the domain object is a relic of a factory based 
structure).
It took me a long time to find on a general stream interface that was at 
least partially statisfying.

When i started implementing the concrete objects i've found that many 
methods looked almost the same, so i refactored the code and put the 
common code in an implementation class.  Then I thought that the library 
user could find usefull to deal with the actual stream type, and 
promoted the implementation class as a public object, with a virtual 
interface adapter optionally applicable (i.e. the external polymorphism 
pattern, or type erasure).

Even the virtual adapter could be generated with the use of templates. 

I was happy with this design untill I realized that I was just 
duplicating what could be better done with a dynamic_any or with 
boost::IDL, and i scraped it.  Only the concrete objects were left and i 
  have yet to find the need to put the interfaces back.

Dynamic polymorphism can certanly increase flexibility without template 
bloat, but there would be really that much code generated? An 
http<tcp_stream> and an http<ssl_stream> certanly can share 99% of the 
code, what you really need is a parametrized function that fetches the 
data from the stream and put it in a buffer.  Keep most of the code in a 
base class, or better, put the parametrized function (or functor) in a 
boost::function and store it to the non-templated http protocol object.
Easy.

I didn't remove virtuals just for the sake of it, but it is just an 
accident of design.  I might consider putting them back in the internal 
handle, at least to give read/writes polymoprhic behaviour (this would 
mimic the iostream and streambuffer pair).

BTW, i do not exactly understand what do you exactly mean with  'beeing 
free to layer objects'.
...
...
- Errors can be reported both with exceptions and
with error codes. Exceptions are used by default
unless error callbacks are passed. This seems to
work quite well.  Internally only error codes are
used and exceptions are thrown only at the most
external abstracion layer.
This is a good idea, and very similar to what I have done as well. At
least for async. What is the behavior of blocking read in the face of
error? Is the user callback made inside read? If so, what does read()
return?
An error is thrown, unless a callback is provided. If so the error code 
is passed to the callback.  A throwing read returns the amount of data 
read, a 'callback augmented' read returns the callback itself (callbacks 
are passed by value as with standard algorithms).  The amount of data 
read is passed to the callback along with the error code.
...
...
I will probably add status bits a-la iostreams.
What kind of bits? I can see eof and fail and those cannot be
cleared. Others?
Currently the only bits that i plan to have are: 'input buffer grown' 
and 'output buffer flushed' usefull for asynchronous i/o and buffered 
streams.  I actually do not have (yet) eof and fail because initially 
the stream was supposed to be thread safe and it had to be stateless.  I 
I will certanly add state to keep track of closed connections and 
obviously it will only be resetted if the internal handle is 
reinitialized.
...
...
- File streams.  The library actually try to be
a generalized i/o framework, and file streams are
provided for completeness.
At an abstract level, they are very similar and should behave in a
similar way. I haven't tried to tackle that part because it is an
area where there is already something in place, albeit not async, and
I didn't want to try to integrate into iostream (not my cup of tea).
I think that file I/O is as important in network programming as network 
I/O itself, so it is usefull to have an unified framework.

BTW, polling for I/O readiness (i.e. the select model) does not make 
sense with files, i believe that the asynchronous I/O model is the only 
non blocking io model that fits all stream types.
...
...
- The library can be extended simply by creating
new handles.  In addition to TCP streams there are
Unix streams (come almost for free :-) and file
streams. SSL/TLS was present but did get broken
some time ago and didn't have the time to fix it.
I would be most curious to know how SSL fit in your library and how
other layers interact with or are shielded from it.
Nothing very complex, really. I did write a thin wrapper over OpenSSL. 
I only did take advantage of the ability to initalize a context with an 
already connected file descriptor, then wrapped the context along in an 
handle.  The read/write methods simply forwarded the call to 
SSL_read/SSL_write.  I didn't really take advantage of the BIO 
infrasturcture, that will probably be necessary to make an ssl_stream an 
adapter over any kind of stream.
...
...
- Input/Output buffer.
[some good stuff was here<g>]
...
From the little I've read through the code, it looks like this is a
layer above the raw stream impl. I think that is exactly the right
way to go. :)
Yes, the buffered stream is just a layer above the standard stream.
Also it should be very easy to implement a streambuffer on top of the 
buffered stream adaptor.
I think that the buffered adaptor will greatly simplify the asynchronous 
buffer management: asynchronous reads put data in the internal input 
buffer that can be grown efficiently as much as needed (it is a deque); 
user code copies data from this buffer to its own buffer, or takes 
ownership of it. Asynchronous writes take data from the output buffer; 
user code copies data form therir internal buffers to this buffer, or 
relinquish ownership of their buffer or, if they want to keep ownership 
of the buffer and still avoid the extra copy, the must use a special 
buffer that is guaranteed to be immutable (i.e. once created it can 
never be changed, copies share the internal data using shared_ptrs.  I 
do not have it yet, will add it when i'll attack the asynchronous io 
problem).  If the user does not want to have automatic buffer 
management, it can still use the unbuffered functions, but then it is 
his job to garantee that buffers stay valid and unchanged untill the 
operation is completed (i.e. a mess!!).
...
...
Missing (definitelly not complete list):
The library is fully sinchronous for now.  I'm
still considering how to add asynch support. I
think i will implement it in the buffered adaptor 
I/O is done asynchronously to the internal
buffers that can grow as much as it is necessary.
Timeouts are definitelly a must-have.
Agreed on async and timeout. Can sync calls be manually/explicitly
canceled? In my experience (and opinion<g>), a reader/writer MT
design needs cancel semantics. Without it, such an app cannot be
responsive to outside stimuli.
No, not yet, still on my todo list.
Well, you can obviously cancel a pending operation by shutting down the 
stream, but a much more gentle solution is needed :-).
...
...
Final notes:
I've have seen that the current consens is to
encode the the stream type in the address, so
to allow a dynamic behaviour: the actual
transport is selected only at runtime, based
on the address string. I think this is a bad
decision (i considered doing it while
implementing my library) and this is why:
I am more and more convinced that this is not the right approach for
the library core, but from different reasons (see other posts). It
could be offered as a stand-alone library for an app that has the
need for this, but I think it is most likely a trivial map problem
(plus a little text manipulation).
Yes, as an add-on would be fine, but the transport-encoded address 
should not be a central concept.
...
...
- C++ is a static language, let's leave these
niceties to more dynamic languages.
I think C++ is quite dynamic (not in the java script way<g>) and
should exercise that power where appropriate :) It pains me to see
Java servers everywhere. C++ can and should have all the HTTP, SSL
and server stuff and be as easy to develop servlet-like things. One
does not need reflection, dynamic loading whatnot to play well in
that space. One does need standard (or at least defacto) libraries.
Without them, effort is fragmented and disjoint.
Which is why I joined boost. :)
Well with static i meant :"do as much work at compile time as possible" 
which translates to "catch as many errors as early as possible" ;-).
C++ is certanly dynamic, but i kind-of-like the way everything is NOT 
always an object.

[Really off-topic] BTW, I would *love* complete, standard, compile-time 
reflection facilities.
...
[...]
...
- It is extremely insecure. In a network library
security must be paramount. If the transport type
were encoded in the address, it would be much
harder to validate externally received addresses.
Good point. Validation is one thing, but meeting expectations of the
software is another. In some cases, just any transport may not be
appropriate and hence should be validated. This can be done from the
string form, of course, but it presents a wider interface.
Just to give an example: Stevens in Unix Network Programming Vol 1 shows 
an example of getaddrinfo that, as an extension, could return unix 
domain sockets in addition to ipv4 and ipv6 sockets.  Glibc did actually 
implement the extension.  It was removed later because of security 
concerns: see this post for details.

  http://sources.redhat.com/ml/libc-hacker/2001-05/msg00044.html

You might want to treat streams polymophically once created, but at 
creation the type should be statically known by the user code, because 
it needs to be aware that not all streams have the same semantics. You 
might say that not all streams are 'created' equals :-).
...
[...]

...
...
Sorry for the long post, just tryin' to be usefull :-).
Don't be sorry. I am sure I've written longer posts and it was
helpful.
Well, this *certanly* was a long post.  I hope i've cleraed some details 
of my library.
Now, let's get back to code.

--

Giovanni P. Deretta