
Don G wrote:
- User is not required to reference streams by pointer, streams are stack allocated or are simply members of another object. Internally they have a smart pointer to an implementaion handle. Consider them stack-based proxies. The acceptor and the connector return the handle that is asigned to the stream.
So is this choice just for user simplification? Internally, the user is still holding a pointer, right? What are the copy semantics of the objects held by the user? This is where things can be tricky any way you go. Either the object is copyable and confusion can come via aliasing, or they aren't which is probably better in this case, but could possibly cause some idioms to not work (like "stream s = my_clever_stream_creator()"). My preference was to use shared_ptr<> as the semantics are well understood and objects can layer easily in obvious ways, but the cost is "->" vs "." syntax.
Yes it holds a pointer, a shared_ptr actually, this makes it possible for some part of the library to temporarily hold a (potentially weak) reference to the handle without fear that it might be destroyed/closed. I think this will come handy with asynchronous I/O. [1] I think that stack semantics are much more intuitive for non polymorphic objects (i.e. iostreams versus streambuffers). Your example could be rewritten as 'my_clever_stream_creator(s)' without really losing expressivity in the non-polymorphic case. This is exactly how connectors and acceptors work in my library. Currently the wrapper is copyable, but i will probably correct this unless i find very good reasons not to (the only one i can find currently is two threads wanting to do parallel i/o on the same file: the stream classes are not thread safe, so each thread might want to have a copy). 1: Note that the internal file descriptor is closed when and only when the owner handle is closed. There is no close() call although shutdown() is available. Thus there is no risk that the operating system might reuse the same file descriptor number while there are stale FDs around. Usefull if you need a 'FD->handle' map.
- The preferred way to do input output is to use standard-like algorithms (i.e. copy) with buffered stream adaptors and specialized input/output iterators. I believe that an efficient library can be written this way and be very C++-user-friendly. Classic read/write are still available, but their semantics might be surprising.
I agree that this is the right approach for many users and protocols, but most network programmers (including myself<g>) need access to the primitive behaviors. They won't find them surprising unless the wrapping violates expectations coming from sockets-like programming.
They are available, in fact they are necessary to implement the rest of the library ;-), but they do not try to be user-friendly: they have many parameters, complex return values, non trivial preconditions and postconditions. For example there is no guarantee that a write always writes the whole buffer in absence of errors, it might do a partial write for no reason at all (obviously minimizing the number of calls is a quality of implementation issue).
- All classes are concrete, no polymorphism is used (i.e. no virtuals). Polimorphic behaviour must currently be achieved with some external mean (i.e using the external polymorphism pattern. I think that the boost::IDL library would be great).
Here is where we are at different ends of the spectrum :). I didn't see any SSL code, so I can only imagine how the http code would handle SSL vs. non-SSL stream underneath. Ideally, this should not require two template instantiations like http<stream> and http<ssl_stream>, for example.
In the end, I thought templates had little to offer at this level. Parameterizing protocols by stream type seems (IMHO) to buy nothing in particular except the removal of virtual at the expense of the user having to specify <kind_of_stream> and _lots_ of extra code generation. The app should be able to layer objects as it sees fit and run-time polymorphism is (again, IMHO) the right solution to that problem.
I did start with virtual interfaces based design (Part of it still visible, for example the address object is way too much complex for my current needs, also the domain object is a relic of a factory based structure). It took me a long time to find on a general stream interface that was at least partially statisfying. When i started implementing the concrete objects i've found that many methods looked almost the same, so i refactored the code and put the common code in an implementation class. Then I thought that the library user could find usefull to deal with the actual stream type, and promoted the implementation class as a public object, with a virtual interface adapter optionally applicable (i.e. the external polymorphism pattern, or type erasure). Even the virtual adapter could be generated with the use of templates. I was happy with this design untill I realized that I was just duplicating what could be better done with a dynamic_any or with boost::IDL, and i scraped it. Only the concrete objects were left and i have yet to find the need to put the interfaces back. Dynamic polymorphism can certanly increase flexibility without template bloat, but there would be really that much code generated? An http<tcp_stream> and an http<ssl_stream> certanly can share 99% of the code, what you really need is a parametrized function that fetches the data from the stream and put it in a buffer. Keep most of the code in a base class, or better, put the parametrized function (or functor) in a boost::function and store it to the non-templated http protocol object. Easy. I didn't remove virtuals just for the sake of it, but it is just an accident of design. I might consider putting them back in the internal handle, at least to give read/writes polymoprhic behaviour (this would mimic the iostream and streambuffer pair). BTW, i do not exactly understand what do you exactly mean with 'beeing free to layer objects'.
- Errors can be reported both with exceptions and with error codes. Exceptions are used by default unless error callbacks are passed. This seems to work quite well. Internally only error codes are used and exceptions are thrown only at the most external abstracion layer.
This is a good idea, and very similar to what I have done as well. At least for async. What is the behavior of blocking read in the face of error? Is the user callback made inside read? If so, what does read() return?
An error is thrown, unless a callback is provided. If so the error code is passed to the callback. A throwing read returns the amount of data read, a 'callback augmented' read returns the callback itself (callbacks are passed by value as with standard algorithms). The amount of data read is passed to the callback along with the error code.
I will probably add status bits a-la iostreams.
What kind of bits? I can see eof and fail and those cannot be cleared. Others?
Currently the only bits that i plan to have are: 'input buffer grown' and 'output buffer flushed' usefull for asynchronous i/o and buffered streams. I actually do not have (yet) eof and fail because initially the stream was supposed to be thread safe and it had to be stateless. I I will certanly add state to keep track of closed connections and obviously it will only be resetted if the internal handle is reinitialized.
- File streams. The library actually try to be a generalized i/o framework, and file streams are provided for completeness.
At an abstract level, they are very similar and should behave in a similar way. I haven't tried to tackle that part because it is an area where there is already something in place, albeit not async, and I didn't want to try to integrate into iostream (not my cup of tea).
I think that file I/O is as important in network programming as network I/O itself, so it is usefull to have an unified framework. BTW, polling for I/O readiness (i.e. the select model) does not make sense with files, i believe that the asynchronous I/O model is the only non blocking io model that fits all stream types.
- The library can be extended simply by creating new handles. In addition to TCP streams there are Unix streams (come almost for free :-) and file streams. SSL/TLS was present but did get broken some time ago and didn't have the time to fix it.
I would be most curious to know how SSL fit in your library and how other layers interact with or are shielded from it.
Nothing very complex, really. I did write a thin wrapper over OpenSSL. I only did take advantage of the ability to initalize a context with an already connected file descriptor, then wrapped the context along in an handle. The read/write methods simply forwarded the call to SSL_read/SSL_write. I didn't really take advantage of the BIO infrasturcture, that will probably be necessary to make an ssl_stream an adapter over any kind of stream.
- Input/Output buffer.
[some good stuff was here<g>]
From the little I've read through the code, it looks like this is a layer above the raw stream impl. I think that is exactly the right way to go. :)
Yes, the buffered stream is just a layer above the standard stream. Also it should be very easy to implement a streambuffer on top of the buffered stream adaptor. I think that the buffered adaptor will greatly simplify the asynchronous buffer management: asynchronous reads put data in the internal input buffer that can be grown efficiently as much as needed (it is a deque); user code copies data from this buffer to its own buffer, or takes ownership of it. Asynchronous writes take data from the output buffer; user code copies data form therir internal buffers to this buffer, or relinquish ownership of their buffer or, if they want to keep ownership of the buffer and still avoid the extra copy, the must use a special buffer that is guaranteed to be immutable (i.e. once created it can never be changed, copies share the internal data using shared_ptrs. I do not have it yet, will add it when i'll attack the asynchronous io problem). If the user does not want to have automatic buffer management, it can still use the unbuffered functions, but then it is his job to garantee that buffers stay valid and unchanged untill the operation is completed (i.e. a mess!!).
Missing (definitelly not complete list):
The library is fully sinchronous for now. I'm still considering how to add asynch support. I think i will implement it in the buffered adaptor I/O is done asynchronously to the internal buffers that can grow as much as it is necessary. Timeouts are definitelly a must-have.
Agreed on async and timeout. Can sync calls be manually/explicitly canceled? In my experience (and opinion<g>), a reader/writer MT design needs cancel semantics. Without it, such an app cannot be responsive to outside stimuli.
No, not yet, still on my todo list. Well, you can obviously cancel a pending operation by shutting down the stream, but a much more gentle solution is needed :-).
Final notes:
I've have seen that the current consens is to encode the the stream type in the address, so to allow a dynamic behaviour: the actual transport is selected only at runtime, based on the address string. I think this is a bad decision (i considered doing it while implementing my library) and this is why:
I am more and more convinced that this is not the right approach for the library core, but from different reasons (see other posts). It could be offered as a stand-alone library for an app that has the need for this, but I think it is most likely a trivial map problem (plus a little text manipulation).
Yes, as an add-on would be fine, but the transport-encoded address should not be a central concept.
- C++ is a static language, let's leave these niceties to more dynamic languages.
I think C++ is quite dynamic (not in the java script way<g>) and should exercise that power where appropriate :) It pains me to see Java servers everywhere. C++ can and should have all the HTTP, SSL and server stuff and be as easy to develop servlet-like things. One does not need reflection, dynamic loading whatnot to play well in that space. One does need standard (or at least defacto) libraries. Without them, effort is fragmented and disjoint.
Which is why I joined boost. :)
Well with static i meant :"do as much work at compile time as possible" which translates to "catch as many errors as early as possible" ;-). C++ is certanly dynamic, but i kind-of-like the way everything is NOT always an object. [Really off-topic] BTW, I would *love* complete, standard, compile-time reflection facilities.
[...]
- It is extremely insecure. In a network library security must be paramount. If the transport type were encoded in the address, it would be much harder to validate externally received addresses.
Good point. Validation is one thing, but meeting expectations of the software is another. In some cases, just any transport may not be appropriate and hence should be validated. This can be done from the string form, of course, but it presents a wider interface.
Just to give an example: Stevens in Unix Network Programming Vol 1 shows an example of getaddrinfo that, as an extension, could return unix domain sockets in addition to ipv4 and ipv6 sockets. Glibc did actually implement the extension. It was removed later because of security concerns: see this post for details. http://sources.redhat.com/ml/libc-hacker/2001-05/msg00044.html You might want to treat streams polymophically once created, but at creation the type should be statically known by the user code, because it needs to be aware that not all streams have the same semantics. You might say that not all streams are 'created' equals :-).
[...]
Sorry for the long post, just tryin' to be usefull :-).
Don't be sorry. I am sure I've written longer posts and it was helpful.
Well, this *certanly* was a long post. I hope i've cleraed some details of my library. Now, let's get back to code. -- Giovanni P. Deretta