
Hiya all. I already briefly discussed this with Jonathan Turkan and I understood he is interested to continue on a new project when IOStreams is accepted. I'd like to work together with him in this new project. In the past week I've done some "research" (between quotes because I still very new to this, can't call myself knowledgable yet). I'd start an early kick-off with this post ;). IOStreams can be seen as a library that allows one to define the following: Source --> filter --> filter --> Sink The relationship between the types involved can be delayed till runtime, but each individual type above has to be coded before that. It makes most sense to look at the above as instances instead of classes therefore (cause I did draw in the relationships between the different types already): Source_instance --> filter_instance --> filter_instance --> Sink_instance But for brevity I will not type the '_instance' part anymore. Obviously, we can have multiple Sources and Sinks at the same time: Source --> Sink Source --> filter --> filter --> Sink Source --> filter --> filter --> filter --> Sink Source --> filter --> Sink ... Each of these Sources will need to be handled by the Operating System; for example if they are sockets. In order to make an application work one will need "event demultiplexing" like forinstance is provided by select() or poll(). In order to write a framework that can "connect" the Sources and Sinks to this event demultiplexor in a flexible, portable and efficient way - a lot of design decisions need to be taken. Below I have made a list of a few topics that we can discuss, one by one - in order to make progress with these decisions. I am aware that there has been made a beginning with a design for such a framework already with boost in mind (see http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?BoostSocket) This design mainly considers sockets, but I think that a much wider approach is necessary (as we can discuss by following the points below). But of course we will use anything of this project that is useful. Another important source of experience is libACE. Although libACE provides everything we might need, it doesn't 'connect' well to boost; it has a lot of interfaces that are already done in boost but in another way. Also, I think that libACE is doing more than we need and not everything that we might want. If it were possible to write a more 'native' event demultiplexing library for boost then that will definitely add useful and additional functionality for the open source community. Apart from that, there is the license issue. I believe that the boost libraries are more 'free' then ACE, so in some cases ACE might not even be an option. ========================================================== Follows: Topics that can be discussed. I'd like to urge to stick to one Topic per thread. 1. Is there a need for this library? Should be a new effort like proposed or do we have to revive 'boost Sockets' and only build upon that without throwing things away? If "yes - there is a need for it and we might as well start with a clean plate": For the below I have chosen to phrase the topics as statements that I believe I can give sufficient argumentation for (if needed). 2. It is unavoidable that this library uses threads. 3. User programs compiled for platform A, and linking with a shared versions of this library - do not need to be able to run on platform B (with same architecture) without recompilation. 4. It is unacceptable that we would depend on other libraries (libevent, libACE) for this, except standard libraries (libc, libstdc++, socket, ws2_32, etc). 5. We should heavily lean on the expertise that is condensed in libACE for things like the following: There must be a concept "Event Handler" (the 'call back'). There must be a concept "Acceptor", "Connector" and "Handle" types. 6. The ideas about I/O filtering as discussed for Apache (see http://www.serverwatch.com/tutorials/article.php/1129721) need to be analyzed and probably implemented in IOStreams before we can really design the demultiplexor library. I will start a thread for each of these topics. Please reply to the first subject that you do not agree with (yet) only: it makes no sense to participate in discussions with a higher number if you disagree with the statement of one with a lower number :) Regards, -- Carlo Wood <carlo@alinoe.com>

1. Is there a need for this library? Should be a new effort like proposed or do we have to revive 'boost Sockets' and only build upon that without throwing things away?
In my opinion: yes. This opinion is based on the impressions that I got from studying the existing code/ideas for a full week and the ideas that I have myself about what is needed. There is a gap here - a new library will be able to address all my needs, the existing designs do not. One major short coming of the existing ones is that they are not IOStreams aware.

"Carlo Wood" wrote:
In my opinion: yes.
I would agree. But then, I am biased and joined boost mailing list for selfish reasons, looking for semi-standard, efficient and top of the crop libraries that I either did or will have to write myself. I was actually looking into ACE before 'discovering' boost so the direction in which you would like to go is something I can very well associate with. Thumbs up! Tony

Thanks! I am looking forward to your comments on topics 2 through 6. I've already got a 7. prepared ;) - but I think I better wait with that till we got a few people till that point. On Sun, Sep 12, 2004 at 09:01:24AM -0400, Tony Juricic wrote:
"Carlo Wood" wrote:
In my opinion: yes.
I would agree. But then, I am biased and joined boost mailing list for selfish reasons, looking for semi-standard, efficient and top of the crop libraries that I either did or will have to write myself. I was actually looking into ACE before 'discovering' boost so the direction in which you would like to go is something I can very well associate with.
Thumbs up!
Tony
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Carlo Wood <carlo@alinoe.com>

Carlo Wood wrote:
1. Is there a need for this library? Should be a new effort like proposed or do we have to revive 'boost Sockets' and only build upon that without throwing things away?
I am actually unsure what the scope of this proposed library is. It covers a demultiplexor. What else? There is likely some additional machinery needed to get from system-level source to what IOStreams (or whatever) operates on. I think the demultiplexor should be considered separately from other proposed features, including even actual handling of I/O or any blocking operations. Aaron W. LaFramboise

On Mon, Sep 13, 2004 at 12:41:53AM -0500, Aaron W. LaFramboise wrote:
Carlo Wood wrote:
1. Is there a need for this library? Should be a new effort like proposed or do we have to revive 'boost Sockets' and only build upon that without throwing things away?
I am actually unsure what the scope of this proposed library is. It covers a demultiplexor. What else? There is likely some additional machinery needed to get from system-level source to what IOStreams (or whatever) operates on. I think the demultiplexor should be considered separately from other proposed features, including even actual handling of I/O or any blocking operations.
You might be suprised - but I 100% agree with you :) Actually - I am only interested to code a *minimal* demultiplexor library. Unfortunately, it is probably not possible to give that library a portable interface without also dragging different types of 'devices' into it: a socket isn't a fifo (or else there are too many differences). The starting point will be 'handles' - but while on UNIX *all* handles are 'int' (allowing a library like libevent, see http://monkey.org/~provos/libevent/) there are other OS that use different, incompatible, handles. Unless we know in advance exactly which Operating Systems and which devices we want to support - it is not possible to define a portable interface. The only safe thing to do therefore is to fall back to concepts that are defined outside the Operating Systems, like "IP address", "TCP/IP connection", "timer" - things that can ALWAYS be implemented on any OS. Nevertheless - despite that fact, I'd like to provide a (non portable) interface of a smaller part of the whole at first. And wrap a portable interface around that later. -- Carlo Wood <carlo@alinoe.com>

On Mon, 13 Sep 2004, Carlo Wood wrote:
You might be suprised - but I 100% agree with you :) Actually - I am only interested to code a *minimal* demultiplexor library.
Unfortunately, it is probably not possible to give that library a portable interface without also dragging different types of 'devices' into it: a socket isn't a fifo (or else there are too many differences).
Why not just resume work on the Boost::Socket library? I can't remember the link offhand, but IIRC the design was pretty solid and had hooks for multiplexing as well as a nearly complete blocking i/o implementation. Sean

2. It is unavoidable that this library uses threads.
There are two main reasons that I see: 1) On windows we have a limitation of at most 64 'Event' objects that can be 'waited' for at a time. This is not enough for large server applications that might need thousands of TCP/IP sockets. 2) On windows there are different types of handles/events. It seems to make a lot more sense to use different threads to wait for different types. For example, there is a WSAWaitForMultipleObjects (for sockets) and a WaitForMultipleObjects that allows one to wait for arbitrary events (but not socket events(?)). More in general however - there seems to be a need to use different ways to demultiplex and handle different types - even if the handles of all different types are the same (ie, 'int' on UNIX). Consider the major difference between a listen socket, a very busy UDP socket and a very busy memory mapped filedescriptor. It might be easier to regular priority issues between the different types by putting their dispatchers in separate threads. Note however that I DO think that the callback functions for each event (that is, the moment we start calling IOstream functions) should happen in the same thread again; this new library should shield the use of threads for the user as much as possible! -- Carlo Wood <carlo@alinoe.com>

Carlo Wood wrote:
2. It is unavoidable that this library uses threads.
I disagree strongly. I think spawning additional threads is both unnecessary and undesirable.
1) On windows we have a limitation of at most 64 'Event' objects that can be 'waited' for at a time. This is not enough for large server applications that might need thousands of TCP/IP sockets.
In the case of sockets, Winsock has other mechanisms for scaling in this respect, such as I/O completion routines. On pre-Winsock2 platforms, which hopefully are dwindling, I don't think falling back to the 64 handle limit will be a problem. It seems unlikely to me that there are many cases were the limit would be exceeded. However, in those cases, I don't think it would be a problem if the multiplex user were required to create another thread, and another multiplex. I don't think the multiplex should do this.
2) On windows there are different types of handles/events. It seems to make a lot more sense to use different threads to wait for different types. For example, there is a WSAWaitForMultipleObjects (for sockets) and a WaitForMultipleObjects that allows one to wait for arbitrary events (but not socket events(?)). More in general however - there seems to be a need to use different ways to demultiplex and handle different types - even if the handles of all different types are the same (ie, 'int' on UNIX). Consider the major difference between a listen socket, a very busy UDP socket and a very busy memory mapped filedescriptor. It might be easier to regular priority issues between the different types by putting their dispatchers in separate threads. Note however that I DO think that the callback functions for each event (that is, the moment we start calling IOstream functions) should happen in the same thread again; this new library should shield the use of threads for the user as much as possible!
I also don't think the multiplex should do this. Boost shouldn't second-guess what the user is trying to do. If the user knows he needs two separate threads to handle two separate resources, then let the user create two threads and put a multiplex in each. By _multiplex_ I mean the class (or whatever entity) that implements the core of the demultiplexing of various resources. (I'm using this name because thats what I called it in my own library.) I beleive this class should have these characteristics: 1) Minimal - It should handle every sort of event that might need to be handled, but nothing more. More complex logic, such as pooling and balancing, should be handled elsewhere, possibly by a derived class. In addition, the design should be as unsophisticated as possible. In particular, event notification might be simple functors (no 'observer' frameworks) 2) Efficient - For many applications, performance will be paramount. Many asynchronous algorithms will depend on the multiplex core having negligable overhead, and Boost should not disappoint. As it may be a crucial building block of nearly any real-world program, it should also be storage efficient, to not rule out application in embedded areas. 3) Compatible - See http://article.gmane.org/gmane.comp.lib.boost.devel/109475 It is my opinion, in fact, that this multiplex class should be in its own library, isolated from any other particular library that would depend on it. In other words, it wouldn't be any more coupled with I/O than it would be with Boost.Thread or date_time. Aaron W. LaFramboise

On Sun, 12 Sep 2004 22:18:05 -0500, Aaron W. LaFramboise wrote
Carlo Wood wrote:
2. It is unavoidable that this library uses threads.
I disagree strongly. I think spawning additional threads is both unnecessary and undesirable.
My guess is that Carlo was saying that the library must be thread-safe at a minimum. That is, I could, for example add/release an event callback from a different thread than the multiplexor is running in. But even if he meant more, my take is we might want some threading capabilities...see below.
1) On windows we have a limitation of at most 64 'Event' objects that can be 'waited' for at a time. This is not enough for large server applications that might need thousands of TCP/IP sockets.
In the case of sockets, Winsock has other mechanisms for scaling in this respect, such as I/O completion routines. On pre-Winsock2 platforms, which hopefully are dwindling, I don't think falling back to the 64 handle limit will be a problem.
It's easy to blow this if you start monitoring any significant hardware. Start monitoring some serial ports and setting various timeouts associated with those prots and you can run into troulble easily. In fact, timers is a big problem -- you need to have a smart queing implementation that keeps the number of timers down to the bare minimum....
It seems unlikely to me that there are many cases were the limit would be exceeded. However, in those cases, I don't think it would be a problem if the multiplex user were required to create another thread, and another multiplex. I don't think the multiplex should do this.
Well, it's ugly for the user because it's tough to predict when you are going to hit the 64. So I disagree, I'd like to see the user shielded from this issue.
2) On windows there are different types of handles/events. It seems to make a lot more sense to use different threads to wait for different types. For example, there is a WSAWaitForMultipleObjects (for sockets) and a WaitForMultipleObjects that allows one to wait for arbitrary events (but not socket events(?)). More in general however - there seems to be a need to use different ways to demultiplex and handle different types - even if the handles of all different types are the same (ie, 'int' on UNIX). Consider the major difference between a listen socket, a very busy UDP socket and a very busy memory mapped filedescriptor. It might be easier to regular priority issues between the different types by putting their dispatchers in separate threads. Note however that I DO think that the callback functions for each event (that is, the moment we start calling IOstream functions) should happen in the same thread again; this new library should shield the use of threads for the user as much as possible!
I also don't think the multiplex should do this. Boost shouldn't second-guess what the user is trying to do. If the user knows he needs two separate threads to handle two separate resources, then let the user create two threads and put a multiplex in each.
Well, I think there might need to be some interface here. For example, it would be nice for the multiplexor would have a pool of threads and dispatch each event to execute in a thread. The size of that pool might be '0' in which case the multiplexor uses its' thread to dispatch in -- hence degenerating into a single-threaded arrangement.
By _multiplex_ I mean the class (or whatever entity) that implements the core of the demultiplexing of various resources. (I'm using this name because thats what I called it in my own library.) I beleive this class should have these characteristics:
1) Minimal - It should handle every sort of event that might need to be handled, but nothing more. More complex logic, such as pooling and balancing, should be handled elsewhere, possibly by a derived class. In addition, the design should be as unsophisticated as possible. In particular, event notification might be simple functors (no 'observer' frameworks)
I'd like to see a template approach (see below) that allows new multiplexor and event handler types to be added in as they are developed. The core then just sets up and manages the core of the dispatching.
2) Efficient - For many applications, performance will be paramount. Many asynchronous algorithms will depend on the multiplex core having negligable overhead, and Boost should not disappoint. As it may be a crucial building block of nearly any real-world program, it should also be storage efficient, to not rule out application in embedded areas.
Agreed. BTW, I'd like to see an attempt to remove all virtual methods from the mix.
3) Compatible - See http://article.gmane.org/gmane.comp.lib.boost.devel/109475
It is my opinion, in fact, that this multiplex class should be in its own library, isolated from any other particular library that would depend on it. In other words, it wouldn't be any more coupled with I/O than it would be with Boost.Thread or date_time.
Well, I'll disagree with this one as well. I think it should be coupled with both thread and date_time, since you picked those two ;-) Here's why. I think the interface should look something like this: class mutliplexor { public: //returns a unique event handler id template<class EventHandler, class EventMultiplexor> boost::int32_t register(EventHandler eh, EventMultiplexor em, unsigned int priority); void remove(boost::int32_t event_handler_id); void suspend(boost::int32_t event_handler_id); void run_event_loop(time_duration td = time_duration(pos_infinity)); void end_event_loop(time_duration td = time_duration(pos_infinity)); }; Note that the amount of time to run the event loop is specified as a time_duration which allows things like 'pos_infinity' or run forever to be specified more cleanly than the typical interface which passes '0' meaning run forever. Now I can write code that looks like and it's perfectly clear what it means: multiplexor m(...); //setup while (!done) { m.run_event_loop(seconds(1)); //do other stuff like set done } So there's the hook to date_time. BTW, for this part of date_time you only need headers -- you don't need to link the lib. As for boost.thread, that will be needed because the mutliplexor implemenation of register will need to manage a list of event handlers and will need to be capable of dealing with the remove, suspend, and register running in different user threads. This means it will need to lock. So even if you argue away date-time I don't see how you avoid boost.thread. Jeff

Jeff Garland wrote:
On Sun, 12 Sep 2004 22:18:05 -0500, Aaron W. LaFramboise wrote
Carlo Wood wrote:
1) On windows we have a limitation of at most 64 'Event' objects that can be 'waited' for at a time. This is not enough for large server applications that might need thousands of TCP/IP sockets.
In the case of sockets, Winsock has other mechanisms for scaling in this respect, such as I/O completion routines. On pre-Winsock2 platforms, which hopefully are dwindling, I don't think falling back to the 64 handle limit will be a problem.
It's easy to blow this if you start monitoring any significant hardware. Start monitoring some serial ports and setting various timeouts associated with those prots and you can run into troulble easily. In fact, timers is a big problem -- you need to have a smart queing implementation that keeps the number of timers down to the bare minimum....
I'm not sure I follow you here. The limitation is specifically this: #define MAXIMUM_WAIT_OBJECTS 64 // Maximum number of wait objects That is the maximum amount of objects you can pass to WaitForMultipleObjectsEx(). In the case of serial ports, you only need one of these per port. These are reusable, and several user-visible events may depend on a single system object. Outside of the networking problem, which has the separate solution (APC callbacks, which as far as I know, are unlimited), it is difficult for me to see how you would use more than 64 of these system objects. For timers specifically, I do not think there is any need to use any of the Win32 timer facilities at all. WaitForMultipleObjectsEx() has a millisecond timeout parameter which hopefully is 'good enough' for timing needs. In my implementation, timers are organized into a queue, the difference between now and the timer on the top used for the timeout value.
It seems unlikely to me that there are many cases were the limit would be exceeded. However, in those cases, I don't think it would be a problem if the multiplex user were required to create another thread, and another multiplex. I don't think the multiplex should do this.
Well, it's ugly for the user because it's tough to predict when you are going to hit the 64. So I disagree, I'd like to see the user shielded from this issue.
I guess I'm thinking this will be a very rare event, rare enough that a user will know for sure that they need an extra thread. I can't think of how it would happen. The only way I can see it happening is on an older system where APCs are unavailible, but these tend not to have good support for things like reading from 64 files all at once, anyway. I wouldn't be opposed to a higher level multiplexor abstraction that builds on the core multiplexor abstraction that did spawn extra threads as needed, though.
Well, I think there might need to be some interface here. For example, it would be nice for the multiplexor would have a pool of threads and dispatch each event to execute in a thread. The size of that pool might be '0' in which case the multiplexor uses its' thread to dispatch in -- hence degenerating into a single-threaded arrangement.
I agree. However, I think this could be designed best by having the core multiplexor thread-ignorant, with a separate wrapper component around that core doing the thread management.
I'd like to see a template approach (see below) that allows new multiplexor and event handler types to be added in as they are developed. The core then just sets up and manages the core of the dispatching.
This sounds excellent.
2) Efficient - For many applications, performance will be paramount. Many asynchronous algorithms will depend on the multiplex core having negligable overhead, and Boost should not disappoint. As it may be a crucial building block of nearly any real-world program, it should also be storage efficient, to not rule out application in embedded areas.
Agreed. BTW, I'd like to see an attempt to remove all virtual methods from the mix.
I also agree. For various reasons, I was unable to do this in my own implementation, but I think it should be avoidable.
3) Compatible - See http://article.gmane.org/gmane.comp.lib.boost.devel/109475
It is my opinion, in fact, that this multiplex class should be in its own library, isolated from any other particular library that would depend on it. In other words, it wouldn't be any more coupled with I/O than it would be with Boost.Thread or date_time.
Well, I'll disagree with this one as well. I think it should be coupled with both thread and date_time, since you picked those two ;-)
Here's why. I think the interface should look something like this:
Note that the amount of time to run the event loop is specified as a time_duration which allows things like 'pos_infinity' or run forever to be specified more cleanly than the typical interface which passes '0' meaning run forever. Now I can write code that looks like and it's perfectly clear what it means:
multiplexor m(...); //setup
while (!done) { m.run_event_loop(seconds(1)); //do other stuff like set done }
So there's the hook to date_time. BTW, for this part of date_time you only need headers -- you don't need to link the lib.
You're right. date_time will need to be coupled.
As for boost.thread, that will be needed because the mutliplexor implemenation of register will need to manage a list of event handlers and will need to be capable of dealing with the remove, suspend, and register running in different user threads. This means it will need to lock. So even if you argue away date-time I don't see how you avoid boost.thread.
I still think that this management should be done in a component separable from the core demultiplexing. But yes, you're right: there will be some dependency from the multiplexor library as a whole on threads, too. Aaron W. LaFramboise

On Mon, Sep 13, 2004 at 12:19:07AM -0500, Aaron W. LaFramboise wrote:
I'm not sure I follow you here. The limitation is specifically this: #define MAXIMUM_WAIT_OBJECTS 64 // Maximum number of wait objects That is the maximum amount of objects you can pass to WaitForMultipleObjectsEx(). In the case of serial ports, you only need one of these per port. These are reusable, and several user-visible events may depend on a single system object. Outside of the networking problem, which has the separate solution (APC callbacks, which as far as I know, are unlimited), it is difficult for me to see how you would use more than 64 of these system objects.
We will only need one event for all timers, one event for all signal handlers (at least, that suffices), one event per hardware port (not likely to be a lot), one event per open file descriptor (also not likely to be a lot), one event per fifo and pipe (still, in most cases not causing us to go over the 64 limit) ... You even only need one event for UDP communication (just us a single port - even if you communicate with a lot of clients); The only real problem I see are SOCK_STREAM sockets.
For timers specifically, I do not think there is any need to use any of the Win32 timer facilities at all. WaitForMultipleObjectsEx() has a millisecond timeout parameter which hopefully is 'good enough' for timing needs. In my implementation, timers are organized into a queue, the difference between now and the timer on the top used for the timeout value.
Sounds like a plan. Special timers with higher accuracy can be added later anyway. In most cases 1 milisecond accuracy has to be enough anyway because dispatching events from the mainloop takes time - and all that time the user won't be notified about new events (it will have to return to the mainloop first). The timers will inherently to the concept of Reactor (and mainloop / dispatching) be a bit fuzzy anyway. I think that microsecond accurate timers will need to be implemented for the larger part in the user code: they will need to create their own high-priority thread for handling it - then they might as well wait for the timer in that thread as well (or another thread they create). But then again, we can of course add a little support for such an event, just like we will add support for named pipes and all what not devices are out there.
Well, it's ugly for the user because it's tough to predict when you are going to hit the 64. So I disagree, I'd like to see the user shielded from this issue.
I guess I'm thinking this will be a very rare event, rare enough that a user will know for sure that they need an extra thread. I can't think of how it would happen. The only way I can see it happening is on an
I strongly disagree. There is nothing rare about needing a few hunderd sockets. "640 kb ought be enough for anyone" is a famous sentence of a Mr.B.Gates, don't make the same mistake :p
older system where APCs are unavailible, but these tend not to have good support for things like reading from 64 files all at once, anyway.
I wouldn't be opposed to a higher level multiplexor abstraction that builds on the core multiplexor abstraction that did spawn extra threads as needed, though.
Ah! But that is the proposal! As I said in the original post: "this new library should shield the use of threads for the user as much as possible!" To the user, the NEED for threads should be invisible. If the result is that "-lpthread" will be dependence for the multiplex library - then so be it though: we can't continue to support all kinds of decade old operating systems ... all OS's that I am interested in to support with something sofisticated like this have a working thread support on it. What I am trying to say is that I have absolutely no problem when boost.multiplexor won't compile on a system where boost.thread doesn't compile. [...]
Agreed. BTW, I'd like to see an attempt to remove all virtual methods from the mix.
I also agree. For various reasons, I was unable to do this in my own implementation, but I think it should be avoidable.
I doubt that it will be possible. But we can try :)
I still think that this management should be done in a component separable from the core demultiplexing. But yes, you're right: there will be some dependency from the multiplexor library as a whole on threads, too.
Good! :) On to point 3. then :) -- Carlo Wood <carlo@alinoe.com>

On Mon, 13 Sep 2004 00:19:07 -0500, Aaron W. LaFramboise wrote
I'm not sure I follow you here. The limitation is specifically this: #define MAXIMUM_WAIT_OBJECTS 64 // Maximum number of wait objects That is the maximum amount of objects you can pass to WaitForMultipleObjectsEx(). In the case of serial ports, you only need one of these per port. These are reusable, and several user-visible events may depend on a single system object. Outside of the networking problem, which has the separate solution (APC callbacks, which as far as I know, are unlimited), it is difficult for me to see how you would use more than 64 of these system objects.
Lets say you have a piece of hardware and you want to monitor a bunch of serial ports. Let's say you have 2 timers per serial port plus the port itself. Your total capacity is ~21 port. Then you have a few socket connections and things are reduced even more. Let me put it another way -- I know of 2 projects that have teetered on the brink of this limit. In both cases, it was a huge design issue... If it turns out that it is next near impossible to solve reasonably then I'm ok with backing off, but I hate to see us throw in the towel before we've tried...
For timers specifically, I do not think there is any need to use any of the Win32 timer facilities at all. WaitForMultipleObjectsEx() has a millisecond timeout parameter which hopefully is 'good enough' for timing needs. In my implementation, timers are organized into a queue, the difference between now and the timer on the top used for the timeout value.
That's good -- ACE uses timer queues to solve this problem as well. What I was getting at is that the OS facilities have to be extended or you will run into the timer limit very quickly.
I guess I'm thinking this will be a very rare event, rare enough that a user will know for sure that they need an extra thread. I can't think of how it would happen. The only way I can see it happening is on an older system where APCs are unavailible, but these tend not to have good support for things like reading from 64 files all at once, anyway.
I assure you the projects this occurred on where using the latest versions of the OS. In one case it was hardware augmentation that caused got things up to the limit. In the other it was sheer software complexity/connectivity.
I wouldn't be opposed to a higher level multiplexor abstraction that builds on the core multiplexor abstraction that did spawn extra threads as needed, though.
That's what I think we should strive for.
Well, I think there might need to be some interface here. For example, it would be nice for the multiplexor would have a pool of threads and dispatch each event to execute in a thread. The size of that pool might be '0' in which case the multiplexor uses its' thread to dispatch in -- hence degenerating into a single-threaded arrangement.
I agree. However, I think this could be designed best by having the core multiplexor thread-ignorant, with a separate wrapper component around that core doing the thread management.
Perhaps, but I don't see how even the core multiplexor can be implemented to operate in MT environment without locking. It could be that we can have a locking policy that can be null in the case of single threading... Jeff

Jeff Garland wrote:
On Mon, 13 Sep 2004 00:19:07 -0500, Aaron W. LaFramboise wrote
I'm not sure I follow you here. The limitation is specifically this: #define MAXIMUM_WAIT_OBJECTS 64 // Maximum number of wait objects That is the maximum amount of objects you can pass to WaitForMultipleObjectsEx(). In the case of serial ports, you only need one of these per port. These are reusable, and several user-visible events may depend on a single system object. Outside of the networking problem, which has the separate solution (APC callbacks, which as far as I know, are unlimited), it is difficult for me to see how you would use more than 64 of these system objects.
Lets say you have a piece of hardware and you want to monitor a bunch of serial ports. Let's say you have 2 timers per serial port plus the port itself. Your total capacity is ~21 port. Then you have a few socket connections and things are reduced even more. Let me put it another way -- I know of 2 projects that have teetered on the brink of this limit. In both cases, it was a huge design issue...
I think I might be miscommunicating. Normal events will not require an object handle slot. The only things that will require these are resources that can't be use APCs/completion ports, Windows messages, or the timeout. All of the obvious things such as sockets, files, and timers are handled by the aforementioned features. The only things left over, that will require these slots, are notification handles, processes and threads, synchronization primatives, and a few miscellaneous odds and ends that Win95 can't do with APCs (such as serial ports). In general, I think the feeling is that 64 is adequate because you're probably doing something wrong if you think you need more than this. There may be a very few rare cases when this is insufficient, such as if you were monitoring 70 directories for changes at once, but for people who need to do that, I don't feel bad about requiring them to do something special. (I'm not even sure if Windows has the technical capability to do that.)
For timers specifically, I do not think there is any need to use any of the Win32 timer facilities at all. WaitForMultipleObjectsEx() has a millisecond timeout parameter which hopefully is 'good enough' for timing needs. In my implementation, timers are organized into a queue, the difference between now and the timer on the top used for the timeout value.
That's good -- ACE uses timer queues to solve this problem as well. What I was getting at is that the OS facilities have to be extended or you will run into the timer limit very quickly.
OK.
I wouldn't be opposed to a higher level multiplexor abstraction that builds on the core multiplexor abstraction that did spawn extra threads as needed, though.
That's what I think we should strive for.
OK.
Well, I think there might need to be some interface here. For example, it would be nice for the multiplexor would have a pool of threads and dispatch each event to execute in a thread. The size of that pool might be '0' in which case the multiplexor uses its' thread to dispatch in -- hence degenerating into a single-threaded arrangement.
I agree. However, I think this could be designed best by having the core multiplexor thread-ignorant, with a separate wrapper component around that core doing the thread management.
Perhaps, but I don't see how even the core multiplexor can be implemented to operate in MT environment without locking. It could be that we can have a locking policy that can be null in the case of single threading...
This is an area I am not entirely sure of. In my own designs, I have always used own demultiplexor object per thread, and there is little to be synchronized. I have found value in having the 'register event listener' method be thread-safe, allowing other threads to feed events to a thread in a direct manner. Beyond that, though, I don't think the demultiplexor core needs to know much about threads. Aaron W. LaFramboise

It's easy to blow this if you start monitoring any significant hardware. Start monitoring some serial ports and setting various timeouts associated with those prots and you can run into troulble easily. In fact, timers is a big problem -- you need to have a smart queing implementation that keeps the number of timers down to the bare minimum....
We will need one queue per (for events waiting) user threads. If the user uses threads, and he requests a timer event, then at the very least it should be supported to call a 'callback function' (a virtual function of a provided object) in that same thread. Possibly even we want to add support for having the callback called in a different thread. However - that is just a single event per (for events waiting) user thread. Timers will not become a problem causing us to go over the 64 limit.
It seems unlikely to me that there are many cases were the limit would be exceeded. However, in those cases, I don't think it would be a problem if the multiplex user were required to create another thread, and another multiplex. I don't think the multiplex should do this.
Well, it's ugly for the user because it's tough to predict when you are going to hit the 64. So I disagree, I'd like to see the user shielded from this issue.
My opinion too. Shielding the user from implementation specific things (that vary from Operating System to Operating System) is a Good Thing(tm). Otherwise it will be hard to write portable code :/ [more things I agree with snipped]
Well, I'll disagree with this one as well. I think it should be coupled with both thread and date_time, since you picked those two ;-)
Agreed
Here's why. I think the interface should look something like this:
class mutliplexor { public: //returns a unique event handler id template<class EventHandler, class EventMultiplexor> boost::int32_t register(EventHandler eh, EventMultiplexor em, unsigned int priority);
void remove(boost::int32_t event_handler_id);
void suspend(boost::int32_t event_handler_id);
void run_event_loop(time_duration td = time_duration(pos_infinity)); void end_event_loop(time_duration td = time_duration(pos_infinity)); };
Note that the amount of time to run the event loop is specified as a time_duration which allows things like 'pos_infinity' or run forever to be specified more cleanly than the typical interface which passes '0' meaning run forever. Now I can write code that looks like and it's perfectly clear what it means:
multiplexor m(...); //setup
while (!done) { m.run_event_loop(seconds(1)); //do other stuff like set done }
So there's the hook to date_time. BTW, for this part of date_time you only need headers -- you don't need to link the lib.
As for boost.thread, that will be needed because the mutliplexor implemenation of register will need to manage a list of event handlers and will need to be capable of dealing with the remove, suspend, and register running in different user threads. This means it will need to lock. So even if you argue away date-time I don't see how you avoid boost.thread.
My opinion too; we need to support multi-threaded user application and therefore we have to link with boost.thread. This reason is separated from the fact that I can't see how to implement this within a single thread on windows - even while considering only a few sockets so that the event limit is not the problem. -- Carlo Wood <carlo@alinoe.com>

On Sun, Sep 12, 2004 at 10:18:05PM -0500, Aaron W. LaFramboise wrote:
Carlo Wood wrote:
2. It is unavoidable that this library uses threads.
I disagree strongly. I think spawning additional threads is both unnecessary and undesirable.
I have no problem whatsoever to stick to a single thread if you are right. But, I am convinced it will not be possible to do this without threads. If you know otherwise then I suggest you tell me the solution for problems I will run into when trying to code this within one thread.
1) On windows we have a limitation of at most 64 'Event' objects that can be 'waited' for at a time. This is not enough for large server applications that might need thousands of TCP/IP sockets.
In the case of sockets, Winsock has other mechanisms for scaling in this respect, such as I/O completion routines. On pre-Winsock2 platforms, which hopefully are dwindling, I don't think falling back to the 64 handle limit will be a problem.
I have no problem not supporting winsock1, or with at most 64 sockets. But, it could be supported without that the user even KNOWS it created additional threads - why would it be bad to do that?
It seems unlikely to me that there are many cases were the limit would be exceeded. However, in those cases, I don't think it would be a problem if the multiplex user were required to create another thread, and another multiplex. I don't think the multiplex should do this.
Why not? This is what libACE does too; I am interested to hear why you think it is wrong.
2) On windows there are different types of handles/events. It seems to make a lot more sense to use different threads to wait for different types. For example, there is a WSAWaitForMultipleObjects (for sockets) and a WaitForMultipleObjects that allows one to wait for arbitrary events (but not socket events(?)). More in general however - there seems to be a need to use different ways to demultiplex and handle different types - even if the handles of all different types are the same (ie, 'int' on UNIX). Consider the major difference between a listen socket, a very busy UDP socket and a very busy memory mapped filedescriptor. It might be easier to regular priority issues between the different types by putting their dispatchers in separate threads. Note however that I DO think that the callback functions for each event (that is, the moment we start calling IOstream functions) should happen in the same thread again; this new library should shield the use of threads for the user as much as possible!
I also don't think the multiplex should do this. Boost shouldn't second-guess what the user is trying to do. If the user knows he needs two separate threads to handle two separate resources, then let the user create two threads and put a multiplex in each.
But while that might be possible on GNU/Linux, it might be impossible on windows (for example). The demand to provide a portable interface forces us to create (hidden) threads therefore. If a user decided that he only needs one thread and the library is not allowed to implement the requested interface by running two or more threads internally, then how can I implement an interface that allows one to wait for events on 100 sockets, a timer, a few named pipes, a fifo and some large diskfile I/O at the same time? That is possible on linux, but I failed to figure out how this can be done on windows :/
By _multiplex_ I mean the class (or whatever entity) that implements the core of the demultiplexing of various resources. (I'm using this name because thats what I called it in my own library.) I beleive this class should have these characteristics:
1) Minimal - It should handle every sort of event that might need to be handled, but nothing more. More complex logic, such as pooling and balancing, should be handled elsewhere, possibly by a derived class.
Agreed.
In addition, the design should be as unsophisticated as possible. In particular, event notification might be simple functors (no 'observer' frameworks)
How would one use functors to wait for the plethora of different events to be handled? Surely not as template parameter of the multiplexor class. You mean as template parameter of methods of that class? That would still involve a dereference (some virtual function) in the end, somewhere imho; you don't seem to gain anything from this in terms of inlining (the main reason for functors I thought). Templates do however tend to cause a code bloat :/
2) Efficient - For many applications, performance will be paramount. Many asynchronous algorithms will depend on the multiplex core having negligable overhead, and Boost should not disappoint. As it may be a crucial building block of nearly any real-world program, it should also be storage efficient, to not rule out application in embedded areas.
Hmm. I agree with the efficiency in terms of cpu. But as always, storage efficiency and cpu efficiency are eachothers counter parts. You cannot persuade both at the same time. I think that embedded applications need a different approach - they are a different field. It will not necessarily be possible to serve both: high performance, real time server applications AND embedded applications at the same time. In that case I will choose for the high-end server applications at any time :/ (cause that is were my personal interests are).
3) Compatible - See http://article.gmane.org/gmane.comp.lib.boost.devel/109475
It is my opinion, in fact, that this multiplex class should be in its own library, isolated from any other particular library that would depend on it. In other words, it wouldn't be any more coupled with I/O than it would be with Boost.Thread or date_time.
I still think we will need threads - not only internally but even as part of the interface to support users that WANT to write multi-threaded applications. Consider a user who has two threads running and he wants to wait for events in both threads. He will then need a 'Reactor' object for both threads: both threads will need their own 'main loop'. Supporting that (and we have to support it imho) means that the library is thread aware. I think we have to depend on Boost.Thread as soon as threads are being used; I am not willing duplicate the code from Boost.Thread inside Boost.Multiplexor, just to be independend of of Boost.Thread. -- Carlo Wood <carlo@alinoe.com>

Carlo Wood wrote:
On Sun, Sep 12, 2004 at 10:18:05PM -0500, Aaron W. LaFramboise wrote:
Carlo Wood wrote:
2. It is unavoidable that this library uses threads.
I disagree strongly. I think spawning additional threads is both unnecessary and undesirable.
I have no problem whatsoever to stick to a single thread if you are right. But, I am convinced it will not be possible to do this without threads. If you know otherwise then I suggest you tell me the solution for problems I will run into when trying to code this within one thread.
Nothing irritates me more than using some third party library and noticing its spawning some hidden thread that I just know it doesn't need and isn't adding any real value. Especially as I have implemented such a demultiplexor core without having sophisticated thread built-in support that was nonetheless very useful in conjunction with multithreading, I know that if Boost's multiplexor uses threading, I would not use it.
1) On windows we have a limitation of at most 64 'Event' objects that can be 'waited' for at a time. This is not enough for large server applications that might need thousands of TCP/IP sockets.
In the case of sockets, Winsock has other mechanisms for scaling in this respect, such as I/O completion routines. On pre-Winsock2 platforms, which hopefully are dwindling, I don't think falling back to the 64 handle limit will be a problem.
I have no problem not supporting winsock1, or with at most 64 sockets. But, it could be supported without that the user even KNOWS it created additional threads - why would it be bad to do that?
I suppose it could also answer the user's email, and download advertisements from the internet that the user might be interested in seeing. I don't see how about could be bad... I am not opposed to any of these fancy thread management features. However, I do think that there needs to be fundemental, minimal demultiplexor core that does not have them. These extra features may be implemented by a separate module on top of the core, through delegation, inheritance, or something else.
It seems unlikely to me that there are many cases were the limit would be exceeded. However, in those cases, I don't think it would be a problem if the multiplex user were required to create another thread, and another multiplex. I don't think the multiplex should do this.
Why not? This is what libACE does too; I am interested to hear why you think it is wrong.
I think ACE is too complicated, unnecessarily.
2) On windows there are different types of handles/events. It seems to make a lot more sense to use different threads to wait for different types. For example, there is a WSAWaitForMultipleObjects (for sockets) and a WaitForMultipleObjects that allows one to wait for arbitrary events (but not socket events(?)). More in general however - there seems to be a need to use different ways to demultiplex and handle different types - even if the handles of all different types are the same (ie, 'int' on UNIX). Consider the major difference between a listen socket, a very busy UDP socket and a very busy memory mapped filedescriptor. It might be easier to regular priority issues between the different types by putting their dispatchers in separate threads. Note however that I DO think that the callback functions for each event (that is, the moment we start calling IOstream functions) should happen in the same thread again; this new library should shield the use of threads for the user as much as possible!
I also don't think the multiplex should do this. Boost shouldn't second-guess what the user is trying to do. If the user knows he needs two separate threads to handle two separate resources, then let the user create two threads and put a multiplex in each.
But while that might be possible on GNU/Linux, it might be impossible on windows (for example). The demand to provide a portable interface forces us to create (hidden) threads therefore. If a user decided that he only needs one thread and the library is not allowed to implement the requested interface by running two or more threads internally, then how can I implement an interface that allows one to wait for events on 100 sockets, a timer, a few named pipes, a fifo and some large diskfile I/O at the same time? That is possible on linux, but I failed to figure out how this can be done on windows :/
I am confused. What feature is missing on Windows? It is my perception that the Windows API is quite as expressive as anything Linux has.
By _multiplex_ I mean the class (or whatever entity) that implements the core of the demultiplexing of various resources. (I'm using this name because thats what I called it in my own library.) I beleive this class should have these characteristics:
1) Minimal - It should handle every sort of event that might need to be handled, but nothing more. More complex logic, such as pooling and balancing, should be handled elsewhere, possibly by a derived class.
Agreed.
In addition, the design should be as unsophisticated as possible. In particular, event notification might be simple functors (no 'observer' frameworks)
How would one use functors to wait for the plethora of different events to be handled? Surely not as template parameter of the multiplexor class. You mean as template parameter of methods of that class? That would still involve a dereference (some virtual function) in the end, somewhere imho; you don't seem to gain anything from this in terms of inlining (the main reason for functors I thought). Templates do however tend to cause a code bloat :/
Well, I'm not sure. As I've mentioned in some thread, my previous design had a few cases on indirection. At the very least, the use of indirection should be minimized. On the other hand, I am not sure that I agree with the general form of multiplexor used by ACE, or mentioned by Jeff. In particular, I do not like the monolithic multiplexor that pokes its way into every module of the program. I think a multiplexor is silent, and only seen when looked for. I also think that an 'event,' while a useful notion for implementors, is not something that needs to exist tangibly. The above comments are based on several reimplementations of the demultiplexor I mentioned that I worked on myself. I found my initial monolithic design, more similar to be ACE, to be much harder to work with, for no particular advantage.
2) Efficient - For many applications, performance will be paramount. Many asynchronous algorithms will depend on the multiplex core having negligable overhead, and Boost should not disappoint. As it may be a crucial building block of nearly any real-world program, it should also be storage efficient, to not rule out application in embedded areas.
Hmm. I agree with the efficiency in terms of cpu. But as always, storage efficiency and cpu efficiency are eachothers counter parts. You cannot persuade both at the same time. I think that embedded applications need a different approach - they are a different field. It will not necessarily be possible to serve both: high performance, real time server applications AND embedded applications at the same time. In that case I will choose for the high-end server applications at any time :/ (cause that is were my personal interests are).
I do not feel there is any need for a demultiplexor to be large. You could easily make one large, as with ACE, but I do not think it would provide an advantage, even for "high-end server applications."
3) Compatible - See http://article.gmane.org/gmane.comp.lib.boost.devel/109475
It is my opinion, in fact, that this multiplex class should be in its own library, isolated from any other particular library that would depend on it. In other words, it wouldn't be any more coupled with I/O than it would be with Boost.Thread or date_time.
I still think we will need threads - not only internally but even as part of the interface to support users that WANT to write multi-threaded applications.
Consider a user who has two threads running and he wants to wait for events in both threads. He will then need a 'Reactor' object for both threads: both threads will need their own 'main loop'. Supporting that (and we have to support it imho) means that the library is thread aware. I think we have to depend on Boost.Thread as soon as threads are being used; I am not willing duplicate the code from Boost.Thread inside Boost.Multiplexor, just to be independend of of Boost.Thread.
I disagree with the monolithic style used by some other libraries that requires such a complex approach. That case you mentioned is particularly important to me. I have had much luck handling it by simply instantiating two demultiplexor objects--one in each thread. I do think it is a good idea to make parts of the demultiplexor object threadsafe, where there is a need and it does not slow critical operations. Aaron W. LaFramboise

On Mon, 13 Sep 2004 10:18:00 -0500, Aaron W. LaFramboise wrote
Nothing irritates me more than using some third party library and noticing its spawning some hidden thread that I just know it doesn't need and isn't adding any real value.
I couldn't agree more.
Especially as I have implemented such a demultiplexor core without having sophisticated thread built- in support that was nonetheless very useful in conjunction with multithreading, I know that if Boost's multiplexor uses threading, I would not use it.
Hopefully by now my position is clear -- thread safety is required (or at least an option) so that if the user is running MT the mutliplexor will operate correctly in an MT environment. Any other thread usage by the mutliplexor would be an option controlled by the user. Just as an aside, thread usage is totally out of control these days -- and it creates lots of nasty issues.
I am not opposed to any of these fancy thread management features. However, I do think that there needs to be fundemental, minimal demultiplexor core that does not have them. These extra features may be implemented by a separate module on top of the core, through delegation, inheritance, or something else.
Template-based policies ;-)
How would one use functors to wait for the plethora of different events to be handled? Surely not as template parameter of the multiplexor class. You mean as template parameter of methods of that class? That would still involve a dereference (some virtual function) in the end, somewhere imho; you don't seem to gain anything from this in terms of inlining (the main reason for functors I thought). Templates do however tend to cause a code bloat :/
Well, I'm not sure. As I've mentioned in some thread, my previous design had a few cases on indirection. At the very least, the use of indirection should be minimized.
On the other hand, I am not sure that I agree with the general form of multiplexor used by ACE, or mentioned by Jeff. In particular, I do not like the monolithic multiplexor that pokes its way into every module of the program. I think a multiplexor is silent, and only seen when looked for. I also think that an 'event,' while a useful notion for implementors, is not something that needs to exist tangibly.
I don't follow how my interface was monolithic. Unlike must multiplexor designs I've seen it didn't have nary a mention of sockets, file i/o, etc. It wasn't a singleton, either. Of course, it's just an idea -- I haven't built it -- perhaps it can't be done that way. But believe me, I'd love to see some alternate design suggestions -- I'm not locked into current current design thinking by any means. Any chance you are going to be able to post your design? Jeff

Jeff Garland wrote:
On Mon, 13 Sep 2004 10:18:00 -0500, Aaron W. LaFramboise wrote
Especially as I have implemented such a demultiplexor core without having sophisticated thread built- in support that was nonetheless very useful in conjunction with multithreading, I know that if Boost's multiplexor uses threading, I would not use it.
Hopefully by now my position is clear -- thread safety is required (or at least an option) so that if the user is running MT the mutliplexor will operate correctly in an MT environment. Any other thread usage by the mutliplexor would be an option controlled by the user.
As I mentioned in another email, I agree that thread safety is important for the 'register event listener' method.
Just as an aside, thread usage is totally out of control these days -- and it creates lots of nasty issues.
Its my personal view that irresponsible use of threads and poorly designed event handling are a primary cause of the general instability that continues to be prevailent in modern software.
I am not opposed to any of these fancy thread management features. However, I do think that there needs to be fundemental, minimal demultiplexor core that does not have them. These extra features may be implemented by a separate module on top of the core, through delegation, inheritance, or something else.
Template-based policies ;-)
Sounds perfect.
I don't follow how my interface was monolithic. Unlike must multiplexor designs I've seen it didn't have nary a mention of sockets, file i/o, etc. It wasn't a singleton, either. Of course, it's just an idea -- I haven't built it -- perhaps it can't be done that way. But believe me, I'd love to see some alternate design suggestions -- I'm not locked into current current design thinking by any means. Any chance you are going to be able to post your design?
For one thing, absolutely everyone that works with events within that framework needs to know about your multiplexor class, even if they dont actual need notification of low-level events. This is unnecessary. (There is one particular facet of my own implementation that I am unhappy with that prevents me from seriously recommending it as a whole. It is present out of laziness, as I have not yet taken the time to figure out a solution.) Later today, I think I'm going to post a few examples of a starting point of how I'd like a demultiplexor to look and feel. More details and rationale then. Aaron W. LaFramboise

On Mon, 13 Sep 2004 15:53:05 -0500, Aaron W. LaFramboise wrote
I don't follow how my interface was monolithic. Unlike must multiplexor designs I've seen it didn't have nary a mention of sockets, file i/o, etc. It wasn't a singleton, either. Of course, it's just an idea -- I haven't built it -- perhaps it can't be done that way. But believe me, I'd love to see some alternate design suggestions -- I'm not locked into current current design thinking by any means. Any chance you are going to be able to post your
design?
For one thing, absolutely everyone that works with events within that framework needs to know about your multiplexor class, even if they dont actual need notification of low-level events. This is unnecessary.
Well, I guess I don't see how that is the case -- but it's not really important because I was just posting a potential interface for discussion. I'd rather discuss working frameworks and use those as a starting point...
(There is one particular facet of my own implementation that I am unhappy with that prevents me from seriously recommending it as a whole. It is present out of laziness, as I have not yet taken the time to figure out a solution.)
Later today, I think I'm going to post a few examples of a starting point of how I'd like a demultiplexor to look and feel. More details and rationale then.
Ok great. It would be nice if this would get posted to with Wiki -- on new pages or whatever. All the email get's dizzying after awhile... Jeff

On Mon, Sep 13, 2004 at 10:18:00AM -0500, Aaron W. LaFramboise wrote:
But while that might be possible on GNU/Linux, it might be impossible on windows (for example). The demand to provide a portable interface forces us to create (hidden) threads therefore. If a user decided that he only needs one thread and the library is not allowed to implement the requested interface by running two or more threads internally, then how can I implement an interface that allows one to wait for events on 100 sockets, a timer, a few named pipes, a fifo and some large diskfile I/O at the same time? That is possible on linux, but I failed to figure out how this can be done on windows :/
I am confused. What feature is missing on Windows? It is my perception that the Windows API is quite as expressive as anything Linux has.
No, I am confused. You do as if it trivial. But where is the answer to my question? How can I implement an interface that allows one to wait for events on 100 sockets, a timer, a few named pipes, a fifo and some large diskfile I/O at the same time? Is your answer WaitForMultipleObjects ? Then 1) what about the limit of 64? And 2) How can I use that to wait for events on SOCKET's? The only thing I can find for that is WSAWaitForMultipleEvents. But obviously you can call WSAWaitForMultipleEvents *and* WaitForMultipleObjects at the same time (without using threads). -- Carlo Wood <carlo@alinoe.com>

Carlo Wood wrote:
On Mon, Sep 13, 2004 at 10:18:00AM -0500, Aaron W. LaFramboise wrote:
But while that might be possible on GNU/Linux, it might be impossible on windows (for example). The demand to provide a portable interface forces us to create (hidden) threads therefore. If a user decided that he only needs one thread and the library is not allowed to implement the requested interface by running two or more threads internally, then how can I implement an interface that allows one to wait for events on 100 sockets, a timer, a few named pipes, a fifo and some large diskfile I/O at the same time? That is possible on linux, but I failed to figure out how this can be done on windows :/
I am confused. What feature is missing on Windows? It is my perception that the Windows API is quite as expressive as anything Linux has.
No, I am confused. You do as if it trivial. But where is the answer to my question? How can I implement an interface that allows one to wait for events on 100 sockets, a timer, a few named pipes, a fifo and some large diskfile I/O at the same time?
Sorry, I thought it was rhetorical. -100 sockets: implemented using I/O completion so they trigger with APC's: no slots used on NT or 9x with winsock 2.0 (but probably requires a separate thread on win9x with winsock 1.1 or earlier due to going over the limit (can winsock 1.1 on win9x even handle 100 sockets?) ) -a timer: implemented internals, no handle slots needed. -a few named pipes: implemented with APC on NT, but 9x doesn't have named pipes at all. -a fifo: I'm assuming you mean a normal pipe. as above, APC on winNT, uses one handle slot on win9x. -some large diskfile io: uses APCs, or on Win9x, one handle slot. Total handles used: None on NT, two on win9x. That leaves 62 handles left. The key here is MsgWaitForMultipleObjectsEx(). (Both Msg- and -Ex are necessary.) Win95 mysteriously lacks this, so probably MsgWaitForMultipleObjects() should be used.
Is your answer WaitForMultipleObjects ? Then 1) what about the limit of 64? And 2) How can I use that to wait for events on SOCKET's? The only thing I can find for that is WSAWaitForMultipleEvents. But obviously you can call WSAWaitForMultipleEvents *and* WaitForMultipleObjects at the same time (without using threads).
1) I do not see how a realitistic use case would ever hit the limit. 2) IO completion routines. Aaron W. LaFramboise

On Mon, Sep 13, 2004 at 03:27:12PM -0500, Aaron W. LaFramboise wrote: > -100 sockets: implemented using I/O completion so they trigger with > APC's [..snip..] > > of 64? And 2) How can I use that to wait for events on SOCKET's? The [...] > 2) IO completion routines. Thanks. Please remember that I have no experience with windows whatsoever. I never heard of IO completion ports before (this shouldn't be a problem, I can bite in a topic like a pitbul and in a month I should be an expert in the field ;). I asked my personal windows guru to give me an url about IO completion Ports and he told me: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/base/i_o_completion_ports.asp To my surprise, and possibly to your surprise as well ;), this starts as follows: <quote> I/O completion ports are the mechanism by which an application uses a pool of threads that was created when the application was started to process asynchronous I/O requests. These threads are created for the sole purpose of processing I/O requests. </quote> So... there we have our hidden threads! Did you know this? -- Carlo Wood <carlo@alinoe.com>

Carlo Wood wrote:
On Mon, Sep 13, 2004 at 03:27:12PM -0500, Aaron W. LaFramboise wrote:
2) IO completion routines.
I asked my personal windows guru to give me an url about IO completion Ports and he told me: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/base...
To my surprise, and possibly to your surprise as well ;), this starts as follows:
<quote> I/O completion ports are the mechanism by which an application uses a pool of threads that was created when the application was started to process asynchronous I/O requests. These threads are created for the sole purpose of processing I/O requests. </quote>
So... there we have our hidden threads! Did you know this?
Sorry, I've been overly loose with my terminology. I meant specifically "I/O completion routines," which aren't the same thing as "I/O completion ports." IOCR's are implemented using APCs, while IOCP's are implemented using special support functions, which are not present on Win 9x at all unfortunately. I have not seen any benchmark comparison of IOCR's (and other demultiplexing approaches) to IOCP's. I think I will put this on my research TODO list. I am advocating using IOCR's where possible in the general case. No threads are required here. I also think IOCP's could be useful, but only in specialized cases. In particular, its my understanding that they are primarily useful in high performance situations on high-end server hardware with at least two CPU's. (I don't know if hyperthreading makes a difference here.) I think this would be best implemented by a custom multiplexor extension, perhaps through template mechanism as suggested by Jeff. In any case, even with IOCP's, there should be no hiding the fact that additional threads are being created, and the user should be given as much control as possible in these specialized situations, as surely he will need it then most. Aaron W. LaFramboise

At 23:54 13/09/2004, you wrote: >On Mon, Sep 13, 2004 at 03:27:12PM -0500, Aaron W. LaFramboise wrote: > > -100 sockets: implemented using I/O completion so they trigger with > > APC's [..snip..] > > > > of 64? And 2) How can I use that to wait for events on SOCKET's? The >[...] > > 2) IO completion routines. > >Thanks. Please remember that I have no experience with windows whatsoever. >I never heard of IO completion ports before (this shouldn't be a problem, >I can bite in a topic like a pitbul and in a month I should be an expert >in the field ;). > >I asked my personal windows guru to give me an url about IO completion >Ports and he told me: >http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/base/i_o_completion_ports.asp > >To my surprise, and possibly to your surprise as well ;), this starts as >follows: > ><quote> >I/O completion ports are the mechanism by which an application uses a pool >of threads that was created when the application was started to process >asynchronous I/O requests. These threads are created for the sole purpose >of processing I/O requests. ></quote> > >So... there we have our hidden threads! >Did you know this? They are not limited to just processing asynchronous I/O requests such as sockets / pipes et al read /writes.. they can also be used for user requests (ie a callback to a user function using one of the worker threads) as well... and hence you can build a very nice scaleable multi threaded framework from them BTW I posted a usage example of user requests a current project (it also uses asynch i/o.. tho not shown via my post) back in Feb... see http://lists.boost.org/MailArchives/boost/msg01815.php Unfortunately since then I still don't have time to put together a proposal, but from my pov (a win32 perspective) a iocp like class needs to be in boost::threads before any progress can be made on a serious async i/o / sockets / pipe / server libraries. Regards Mark >-- >Carlo Wood <carlo@alinoe.com> >_______________________________________________ >Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost > > > >--- >Incoming mail is certified Virus Free. >Checked by AVG anti-virus system (http://www.grisoft.com). >Version: 6.0.754 / Virus Database: 504 - Release Date: 06/09/2004 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.754 / Virus Database: 504 - Release Date: 06/09/2004

Carlo Wood wrote:
<quote> I/O completion ports are the mechanism by which an application uses a pool of threads that was created when the application was started to process asynchronous I/O requests. These threads are created for the sole purpose of processing I/O requests. </quote>
So... there we have our hidden threads!
The IOCP API does offer functions to do automatic thread pool management, but I never use it. For one thing, that API uses CreateThread and I prefer to use _beginthreadex. So while the docs may tell you to do things one way, I suggest ignoring them and doing it however you like :) Sean

On Mon, 13 Sep 2004 13:36:32 +0200, Carlo Wood wrote
It seems unlikely to me that there are many cases were the limit would be exceeded. However, in those cases, I don't think it would be a problem if the multiplex user were required to create another thread, and another multiplex. I don't think the multiplex should do this.
Why not? This is what libACE does too; I am interested to hear why you think it is wrong.
Well ACE does 'several things' depending on how it is used. On windows it provides as default the WFMO (wait for multiple objects reactor) and there is also a 'select' reactor. The WFMO reactor is more performant on windows and hence is the default. However, ACE doesn't remove the 64 object limit in the WFMO reactor, so you can run into the wall if you aren't careful. Part of the problem is, if you have something that is bumping up on the 64 object limit you might not get the performance you need from the 'select' reactor. So the problem isn't 'solved' by ACE the way you might imagine... Jeff

Carlo Wood <carlo <at> alinoe.com> writes:
2. It is unavoidable that this library uses threads.
On some platforms, you are almost certainly right - as an implementation detail. But I don't agree that threading needs to have anything in particular to do with the IO multiplexing lib.
There are two main reasons that I see:
1) On windows we have a limitation of at most 64 'Event' objects that can be 'waited' for at a time. This is not enough for large server applications that might need thousands of TCP/IP sockets.
Ok.
2) On windows there are different types of handles/events. It seems to make a lot more sense to use different threads to wait for different types. For example, there is a WSAWaitForMultipleObjects (for sockets) and a WaitForMultipleObjects that allows one to wait for arbitrary events (but not socket events(?)). More in general however - there seems to be a need to use different ways to demultiplex and handle different types - even if the handles of all different types are the same (ie, 'int' on UNIX).
UNIX is a mess given various combinations of SysV IPC, posix, aio that works for disk and not much else (ymmv) etc with various extensions and implementation "details". I'm not sure that trying to support them all is the right direction. Can't we define some set of waitable objects/interfaces that can be implemented on top of whatever platform facilities work best (with a fallback/initial impl that uses whatever works at all on a reasonably basic platform)? This might do away with the "need" for threads (which are yet another source for incompatibility anyway) at some cost in flexibility.
Consider the major difference between a listen socket, a very busy UDP socket and a very busy memory mapped filedescriptor. It might be easier to regular priority issues between the different types by putting their dispatchers in separate threads.
Maybe (probably not for select, but perhaps for kqueue, epoll)? But why does this need to be reflected in the design beyond allowing (not requiring) many threads each with its own event dispatcher?
Note however that I DO think that the callback functions for each event (that is, the moment we start calling IOstream functions) should happen in the same thread again; this new library should shield the use of threads for the user as much as possible!
You propose that threads may be needed as an impl detail that should be hidden. I agree - but I don't see how that is any different from suggesting any other platform dependent implementation detail - eg. futex's and epoll should/might be used on linux. Hopefully, the interface doesn't change. I think you are saying you want an interface that allows the priority of a waitable object to be specified in some way, but that they get delivered in priority order to a single thread? So a minimal approach would be a single event receiver thread waiting on a non-prioritised system event interface, placing events into a priority queue for delivery to the "main" thread? That sounds like a reasonably portable approach, and one that can with little effort support multiple event receivers to address other issues you raise, but I don't think it should be visible at all to the user, or required by the design. I can also imagine cases where the overhead of such an approach would be excessive, and it would be preferable to use a more limited dispatcher with less overhead (or to use multiple such dispatchers, each operating independently, in their own thread, and leaving any synchronisation issues between these threads up to the user. Oh - you say it makes no sense to agree with higher numbered issues. I'm not sure why - this threads issue seems orthogonal to all the rest? Or are you saying that your impl. of 3+ relies on threads? Regards Darryl.

On Mon, Sep 13, 2004 at 04:51:04AM +0000, Darryl Green wrote:
2) On windows there are different types of handles/events. It seems to make a lot more sense to use different threads to wait for different types. For example, there is a WSAWaitForMultipleObjects (for sockets) and a WaitForMultipleObjects that allows one to wait for arbitrary events (but not socket events(?)). More in general however - there seems to be a need to use different ways to demultiplex and handle different types - even if the handles of all different types are the same (ie, 'int' on UNIX).
UNIX is a mess given various combinations of SysV IPC, posix, aio that works for disk and not much else (ymmv) etc with various extensions and implementation "details". I'm not sure that trying to support them all is the right direction.
More likely impossible :). It makes sense to only support the most common ones, that have equivalents on almost all operation systems. The ones that I can come up with are: signals, timers (of course) and then: files, sockets, pipes.
Can't we define some set of waitable objects/interfaces that can be implemented on top of whatever platform facilities work best (with a fallback/initial impl that uses whatever works at all on a reasonably basic platform)?
Isn't this what we are proposing to do? I've repeatedly compared this library with libevent. And that is doing exactly what you are saying here (if I understand you correctly).
This might do away with the "need" for threads (which are yet another source for incompatibility anyway) at some cost in flexibility.
How would this do away for the need for threads when one would, for example, want to wait for events on 1000 tcp sockets and use windows? Moreover, how would you handle waiting for I/O on sockets and at the same time read and write large files on disk (an operation that takes a minute in total say), on windows. Finally, we need to support users that WANT to use more than one thread. What if a user wants to handle high priorty events in one thread - and lesser priority events in a different thread. We must be thread aware in that case imho. libACE uses the singleton pattern for its Reactor object. That seems to make sense, but it also demands that locking is used when accessing that object.
Consider the major difference between a listen socket, a very busy UDP socket and a very busy memory mapped filedescriptor. It might be easier to regular priority issues between the different types by putting their dispatchers in separate threads.
Maybe (probably not for select, but perhaps for kqueue, epoll)? But why does this need to be reflected in the design beyond allowing (not requiring) many threads each with its own event dispatcher?
It doesn't I have to agree. My main problem is that I cannot seem to figure out how to handle all possible events, including socket events, in a single thread on windows. If anyone could inform me how to do that then the NEED for threads would disappear (especially when this method also evades the 64 limit).
Note however that I DO think that the callback functions for each event (that is, the moment we start calling IOstream functions) should happen in the same thread again; this new library should shield the use of threads for the user as much as possible!
You propose that threads may be needed as an impl detail that should be hidden.
Yes. Hidden when they are needed as impl detail that is. The API might provide the need to explicitely communicate about threads with the library. But then it would be up to the user to use that or not, it would be user created threads that are involved then. If the library creates threads then no external calls will be done from those threads.
I agree - but I don't see how that is any different from suggesting any other platform dependent implementation detail - eg. futex's and epoll should/might be used on linux. Hopefully, the interface doesn't change.
Ok, perhaps I am too paranoid :). The results from this impl detail is that the application needs to link with boost.threads (and that that must be supported on the used platform) even when the user only wants to write a single threaded application. I was afraid that people would object to that. It never was my intention to force a user to create threads himself or be bothered with any locking or semaphores or whatsoever.
I think you are saying you want an interface that allows the priority of a waitable object to be specified in some way, but that they get delivered in priority order to a single thread?
I want to reserve the possibilty to implement things using threads. One just impl detail could be that priorities will include using different threads for different priority levels. I am not sure yet if that will be needed however.
So a minimal approach would be a single event receiver thread waiting on a non-prioritised system event interface, placing events into a priority queue for delivery to the "main" thread? That sounds like a reasonably portable approach, and one that can with little effort support multiple event receivers to address other issues you raise, but
I think we agree on what I wanted to raise with 'point 2'.
I don't think it should be visible at all to the user, or required by the design.
Hmm, I'd like to stick to "It never was my intention to force a user to create threads himself or be bothered with any locking or semaphores or whatsoever."
I can also imagine cases where the overhead of such an approach would be excessive, and it would be preferable to use a more limited dispatcher with less overhead (or to use multiple such dispatchers, each operating independently, in their own thread, and leaving any synchronisation issues between these threads up to the user.
We'll go for what is best - they are implementation details as you say :). As for the user doing synchronisation - the user will also get the option to explicitely create threads and start to wait for and dispatch events in his own threads; if he chooses to use the interface in that way then he will have the flexibility to do what you say.
Oh - you say it makes no sense to agree with higher numbered issues. I'm not sure why - this threads issue seems orthogonal to all the rest? Or are you saying that your impl. of 3+ relies on threads?
I think I put point 2 this early because it is very import issue for me. If you'd not agree with it then we have to discuss it with the highest priority until we DO agree or the other points make no sense to waste time with simply because the whole project would be killed already by the disagreement on point 2. There is no other, technical, relationship. -- Carlo Wood <carlo@alinoe.com>

On Mon, 13 Sep 2004, Carlo Wood wrote:
This might do away with the "need" for threads (which are yet another source for incompatibility anyway) at some cost in flexibility.
How would this do away for the need for threads when one would, for example, want to wait for events on 1000 tcp sockets and use windows? Moreover, how would you handle waiting for I/O on sockets and at the same time read and write large files on disk (an operation that takes a minute in total say), on windows.
I suppose if you really wanted to you could loop over multiple calls to WFMO until all of your (>64) events had been processed, but this wouldn't scale well and would risk having data back up if the number of events got too large. But it is just about the only way to handle 1000 simultaneous events in a single thread in Windows. Still, I'd wonder what the point of using WFMO is when MS provides completion ports. Is there a need to service more than just file and socket i/o, or is it just a matter of avoiding the complexities of multithreading?
It doesn't I have to agree. My main problem is that I cannot seem to figure out how to handle all possible events, including socket events, in a single thread on windows. If anyone could inform me how to do that then the NEED for threads would disappear (especially when this method also evades the 64 limit).
See above. Basically, I don't think there is a truly practical way to do this in a single-threaded Windows application.
Ok, perhaps I am too paranoid :). The results from this impl detail is that the application needs to link with boost.threads (and that that must be supported on the used platform) even when the user only wants to write a single threaded application. I was afraid that people would object to that. It never was my intention to force a user to create threads himself or be bothered with any locking or semaphores or whatsoever.
You could always have wrapper classes for the synchronization mechanisms and use boost.threads or skeleton code depending on whether a single or multithreaded app were being compiled. But the user would have to be aware of scaling limitations that may be inherent in a single-threaded version of the code.
Hmm, I'd like to stick to "It never was my intention to force a user to create threads himself or be bothered with any locking or semaphores or whatsoever."
And it should be entirely possible to hide all of this from the user. Use a thread pool that grows as needed and put all the synchronization stuff in interface methods.
We'll go for what is best - they are implementation details as you say :). As for the user doing synchronisation - the user will also get the option to explicitely create threads and start to wait for and dispatch events in his own threads; if he chooses to use the interface in that way then he will have the flexibility to do what you say.
It might be nice if you offered a means for users to get into the guts of the lib if they wanted to. Some folks may want to do lockless i/o. Sean

On Mon, Sep 13, 2004 at 11:05:01AM -0700, Sean Kelly wrote:
I suppose if you really wanted to you could loop over multiple calls to WFMO until all of your (>64) events had been processed,
Suppose an application has 640 sockets. It groups those in ten groups of each 64 sockets and calls the WFMO for the first group. Suppose there are no events in that group. Then what about the other 576 sockets? This solution would mean you'd need to use a timeout for each group, lets say 10 ms - otherwise it takes too much cpu. And then the application becomes unresponsive: it take 100 ms before it even LOOKED again at the first group of sockets. Once we'd increase the number of sockets, this would become totally non-practical. If this is the only solution next to threading then I will choose threading any time.
but this wouldn't scale well and would risk having data back up if the number of events got too large. But it is just about the only way to handle 1000 simultaneous events in a single thread in Windows. Still, I'd wonder what the point of using WFMO is when MS provides completion ports.
I only learned about those since the author of asio mentioned them. This seems the way to go.
Is there a need to service more than just file and socket i/o, or is it just a matter of avoiding the complexities of multithreading?
It doesn't I have to agree. My main problem is that I cannot seem to figure out how to handle all possible events, including socket events, in a single thread on windows. If anyone could inform me how to do that then the NEED for threads would disappear (especially when this method also evades the 64 limit).
See above. Basically, I don't think there is a truly practical way to do this in a single-threaded Windows application.
That is what I was thinking too, hence the Subject line of this thread.
Ok, perhaps I am too paranoid :). The results from this impl detail is that the application needs to link with boost.threads (and that that must be supported on the used platform) even when the user only wants to write a single threaded application. I was afraid that people would object to that. It never was my intention to force a user to create threads himself or be bothered with any locking or semaphores or whatsoever.
You could always have wrapper classes for the synchronization mechanisms and use boost.threads or skeleton code depending on whether a single or multithreaded app were being compiled. But the user would have to be aware of scaling limitations that may be inherent in a single-threaded version of the code.
Agreed.
Hmm, I'd like to stick to "It never was my intention to force a user to create threads himself or be bothered with any locking or semaphores or whatsoever."
And it should be entirely possible to hide all of this from the user. Use a thread pool that grows as needed and put all the synchronization stuff in interface methods.
Definitely, but Aaron will oppose to this idea. He is against using threads even they are completely hidden from the user. If that is possible (completely avoiding threads) using completion ports, then I am all for it.
We'll go for what is best - they are implementation details as you say :). As for the user doing synchronisation - the user will also get the option to explicitely create threads and start to wait for and dispatch events in his own threads; if he chooses to use the interface in that way then he will have the flexibility to do what you say.
It might be nice if you offered a means for users to get into the guts of the lib if they wanted to. Some folks may want to do lockless i/o.
Agreed. Personally I think it should be possible to fine tune a library like this to the last bit. But it should not be at the cost of ease of use for the beginner, nor should it cost other trade offs. -- Carlo Wood <carlo@alinoe.com>

Carlo Wood wrote:
On Mon, Sep 13, 2004 at 11:05:01AM -0700, Sean Kelly wrote:
And it should be entirely possible to hide all of this from the user. Use a thread pool that grows as needed and put all the synchronization stuff in interface methods.
Definitely, but Aaron will oppose to this idea. He is against using threads even they are completely hidden from the user. If that is possible (completely avoiding threads) using completion ports, then I am all for it.
Sort of. IOCP (completion ports) requires at least one thread to wait on a GetQueuedCompletionStatus call until data is ready to be processed. I typically do this with a worker thread or thread pool running in a fairly tight loop--data is consumed, broken into messages, and stuck in a synchronized queue for further processing by the main thread. But GetQueuedCompletionStatus does allow for a timeout parameter, so that worker thread could theoretically be the main thread. The only risk is that a server with a massive number of connections could possibly lose data if the worker thread spends too much time doing other things. So in a sense it's like WFMO but with no limitation on the number of events that can be monitored. Another consideration is that accept() is typically a blocking operation, so if you want to make a single-threaded socket server you would probably have to use AcceptEx, which can complicate the design a bit. I'm personally not a fan of AcceptEx and so let a thread block on accept() instead :)
It might be nice if you offered a means for users to get into the guts of the lib if they wanted to. Some folks may want to do lockless i/o.
Agreed. Personally I think it should be possible to fine tune a library like this to the last bit. But it should not be at the cost of ease of use for the beginner, nor should it cost other trade offs.
Agreed. Sean

On Mon, Sep 13, 2004 at 09:11:46PM -0700, Sean Kelly wrote:
Sort of. IOCP (completion ports) requires at least one thread to wait on a GetQueuedCompletionStatus call until data is ready to be processed. I typically do this with a worker thread or thread pool running in a fairly tight loop--data is consumed, broken into messages, and stuck in a synchronized queue for further processing by the main thread. But GetQueuedCompletionStatus does allow for a timeout parameter, so that worker thread could theoretically be the main thread. The only risk is that a server with a massive number of connections could possibly lose data if the worker thread spends too much time doing other things.
Can you explain in detail how it would be possible to lose data? I understood that when an IO operation finishes, then a 'IO completion packet' is send to the 'IO completion port' and queued there. If no thread is currently waiting in GetQueuedCompletionStatus then still nothing is lost: it will be queued until GetQueuedCompletionStatus is called again. The picture I got is this: There are a number of threads (if you have 4 cpu's then 4 makes sense) that are all waiting in GetQueuedCompletionStatus. At one moment some IO operation that was requested before by any thread completes and that fact is send to the IO completion port. The last thread that entered GetQueuedCompletionStatus will get it (this is so there is no unncecessary context switching needed: if one thread can keep up with the completed IO operations then only one thread runs). Now, if this thread starts handling the IO operation and does not return to its GetQueuedCompletionStatus before another IO operation completes, then the next thread starts running etc. But if there are no threads waiting, then the completion packet is simply queued. Sure, if the queue can run full and overflows ... then I guess we'd lose data - but that just means that the PC can't keep up with processing the IO using just one cpu. The user then should have the possibility to allow using more than one cpu (if he has them). I find this a rather elegant solution for multi-processor computers. However, I don't have a multi-processor PC, so I am not insisting on using IOCP :p Also, I now understand that this solution is *specifically* MT. It forces the user to write an MT application because his IO handling is handled in parallel by more than one thread. This cannot be done without that the user is aware of it. IOCP could be used in a way that only one thread is actually handling the IO (that is, calling the 'callback functions' of the user) but then using IOCP makes no sense. -- Carlo Wood <carlo@alinoe.com>

Carlo Wood wrote:
On Mon, Sep 13, 2004 at 09:11:46PM -0700, Sean Kelly wrote:
Sort of. IOCP (completion ports) requires at least one thread to wait on a GetQueuedCompletionStatus call until data is ready to be processed. I typically do this with a worker thread or thread pool running in a fairly tight loop--data is consumed, broken into messages, and stuck in a synchronized queue for further processing by the main thread. But GetQueuedCompletionStatus does allow for a timeout parameter, so that worker thread could theoretically be the main thread. The only risk is that a server with a massive number of connections could possibly lose data if the worker thread spends too much time doing other things.
Can you explain in detail how it would be possible to lose data? I understood that when an IO operation finishes, then a 'IO completion packet' is send to the 'IO completion port' and queued there. If no thread is currently waiting in GetQueuedCompletionStatus then still nothing is lost: it will be queued until GetQueuedCompletionStatus is called again.
I was thinking of the socket recv buffer (SO_RCVBUF). Though I suppose this only holds true for UDP comms, so perhaps this isn't a big deal.
Sure, if the queue can run full and overflows ... then I guess we'd lose data - but that just means that the PC can't keep up with processing the IO using just one cpu. The user then should have the possibility to allow using more than one cpu (if he has them).
Agreed.
I find this a rather elegant solution for multi-processor computers. However, I don't have a multi-processor PC, so I am not insisting on using IOCP :p
Thig is, IOCP is really the only way to do high-end i/o multiplexing in Windows. I don't think the other options would scale well to thousands of connections.
Also, I now understand that this solution is *specifically* MT. It forces the user to write an MT application because his IO handling is handled in parallel by more than one thread. This cannot be done without that the user is aware of it. IOCP could be used in a way that only one thread is actually handling the IO (that is, calling the 'callback functions' of the user) but then using IOCP makes no sense.
I disagree. IOCP makes sense any time the user expects the need to handle more than 31 simultaneous connections. Standard overlapped i/o may be fast but it doesn't scale as well as IOCP. Sean

On Tue, Sep 14, 2004 at 07:56:49PM -0700, Sean Kelly wrote:
Thing is, IOCP is really the only way to do high-end i/o multiplexing in Windows. I don't think the other options would scale well to thousands of connections.
Also, I now understand that this solution is *specifically* MT. It forces the user to write an MT application because his IO handling is handled in parallel by more than one thread. This cannot be done without that the user is aware of it. IOCP could be used in a way that only one thread is actually handling the IO (that is, calling the 'callback functions' of the user) but then using IOCP makes no sense.
I disagree. IOCP makes sense any time the user expects the need to handle more than 31 simultaneous connections. Standard overlapped i/o may be fast but it doesn't scale as well as IOCP.
That would be a very interesting observation, but - what about IOCR (IO Completion Routines)? It seems to me that both have the same underlying mechanism and therefore I expect IOCR to scale equally well. -- Carlo Wood <carlo@alinoe.com>

On Wed, 15 Sep 2004, Carlo Wood wrote:
On Tue, Sep 14, 2004 at 07:56:49PM -0700, Sean Kelly wrote:
I disagree. IOCP makes sense any time the user expects the need to handle more than 31 simultaneous connections. Standard overlapped i/o may be fast but it doesn't scale as well as IOCP.
That would be a very interesting observation, but - what about IOCR (IO Completion Routines)? It seems to me that both have the same underlying mechanism and therefore I expect IOCR to scale equally well.
True enough. I couldn't remember the reason I opted not to use completion routines in the past--it may just have been that I didn't like the program flow that method required--so I decided to do some digging. I haven't verified this with other sources, but found an interesting quote here (http://www.certmag.com/bookshelf/c06615799.pdf): "Overlapped I/O with callbacks is not an option for several reasons. First, many of the Microsoft-specific extensions do not allow Asynchronous Procedure Calls (APCs) for completion notification. Second, due to the nature of how APCs are handled on Windows, it is possible for an application thread to starve. Once a thread goes into an alertable wait, all pending APCs are handled on a first in first out (FIFO) basis. Now consider the situation in which a server has a connection established and posts an overlapped WSARecv with a completion function. When there is data to receive, the completion routine fires and posts another overlapped WSARecv. Depending on timing conditions and how much work is performed within the APC, another completion function is queued (because there is more data to be read). This can cause the servers thread to starve as long as there is pending data on that socket." Still, completion routines do seem entirely reasonable, though the code will obviously require synchronization just like IOCP would. Sean

3. User programs compiled for platform A, and linking with a shared versions of this library - do not need to be able to run on platform B (with same architecture) without recompilation.
I don't see any benefit in allowing any kind of binary portability. Actually - I think that this is rather trivial issue as I think that most of boost already enforces this. It would restrict the design enormously and put unneccessary strain on the efficiency of the implementation to maintain an ABI interface for this library. Just added this topic to make this clear once and for all ;). -- Carlo Wood <carlo@alinoe.com>

Carlo Wood wrote:
3. User programs compiled for platform A, and linking with a shared versions of this library - do not need to be able to run on platform B (with same architecture) without recompilation.
I don't see any benefit in allowing any kind of binary portability. Actually - I think that this is rather trivial issue as I think that most of boost already enforces this. It would restrict the design enormously and put unneccessary strain on the efficiency of the implementation to maintain an ABI interface for this library. Just added this topic to make this clear once and for all ;).
What do you mean by 'platform' and what do you mean by 'architecture'? Is this non-guarantee different from the usual non-guarantee of different compilers being binary incompatible? Aaron W. LaFramboise

On Mon, Sep 13, 2004 at 12:55:54AM -0500, Aaron W. LaFramboise wrote:
Carlo Wood wrote:
3. User programs compiled for platform A, and linking with a shared versions of this library - do not need to be able to run on platform B (with same architecture) without recompilation.
I don't see any benefit in allowing any kind of binary portability. Actually - I think that this is rather trivial issue as I think that most of boost already enforces this. It would restrict the design enormously and put unneccessary strain on the efficiency of the implementation to maintain an ABI interface for this library. Just added this topic to make this clear once and for all ;).
What do you mean by 'platform' and what do you mean by 'architecture'? Is this non-guarantee different from the usual non-guarantee of different compilers being binary incompatible?
I am afraid that this whole point is too trivial, I shouldn't have added it: its not worth the confusion. What I am trying to say is that an application that uses boost.muliplexor will need a total recompile on every platform it is supposed to run on - not only a separate compile of the multiplexor library - but a separate compile of the application itself as well. There will be no way that any part is binary portable. Basically one could demand that a statically linked application that runs on intel - but uses 'dlopen' to load the multiplexor library - should be able to run on both linux and i386-solaris without recompilation. Apart from that being a ridiculous demand - I think it will be impossible to write the library in such a way that this can be supported. I don't think its worth the time to go into the reason for that observation ;). -- Carlo Wood <carlo@alinoe.com>

4. It is unacceptable that we would depend on other libraries (libevent, libACE) for this, except standard libraries (libc, libstdc++, socket, ws2_32, etc).
I think this is already a boost policy(?). It seems common sense to me. Depending on other libraries would cause a maintenance nightmare, not the mention the license problems. Also a trivial topic thus(?) -- Carlo Wood <carlo@alinoe.com>

5. We should heavily lean on the expertise that is condensed in libACE for things like the following: There must be a concept "Event Handler" (the 'call back'). There must be a concept "Acceptor", "Connector" and "Handle" types.
The userbase of libACE is immense, impressive and backed up by a lot of very knowledable and experienced coders. Their input (in the form of libACE) cannot lightly be ignored and we owe it to ourselfs to study their design and implementation thoroughly and learn from it. -- Carlo Wood <carlo@alinoe.com>

6. The ideas about I/O filtering as discussed for Apache (see http://www.serverwatch.com/tutorials/article.php/1129721) need to be analyzed and probably implemented in IOStreams before we can really design the demultiplexor library.
This is less trivial I am afraid. But nevertheless an important design issue as you will agree. I think that the author of IOStreams, Jonathan, agrees with me that the ideas of the Apache people concerning how to treat data that is filtered should be handled, is something important that needs support in a future version of IOStreams. This idea basically means that data that is being passed on to filters is 'tagged' with a type that signifies its origin; for example - sockets, memory mapped files, a disk file etc. I think that the way of how we incorporate these 'types' into the data/objects that are passed on will have a significant influence on how we need to design event dispatching for the different types. For example - an application that does both, handle sockets and read and process a very large input file (of a few GB say) might spend so much time on processing the file that it neglects the sockets too much. I already brought up the need for threads for this (see point 2). A 'file dispatcher' thread could for example read only a given amount of data per time unit (limit the bandwidth). In most current implementations there are two approaches: 1) The file is processed (read) untill read(2) would block; likely that approach leads to processing the whole file. 2) The file is read until its buffer is "full" - then the application continues with the main loop which will process all data in the buffer - handle ONE cycle of sockets - and then start to read the file again. This only works when there is a correct balance between the need to handle the sockets and the size of the file buffer. If the buffer would be growable without limits then this certainly won't work. We cannot rely of arbitrary ways of distributing cpu over different sources like this. A more controlled priority management is needed. -- Carlo Wood <carlo@alinoe.com>

On Sun, Sep 12, 2004 at 01:37:56PM +0200, Carlo Wood wrote:
this (see point 2). A 'file dispatcher' thread could for example
I meant: "(see topic 2)", not the point 2 that is listed below this line.

"Carlo Wood" <carlo@alinoe.com> wrote in message news:20040912110617.GA20197@alinoe.com...
Hiya all.
I already briefly discussed this with Jonathan Turkan and I understood he is interested to continue on a new project when
*if* ... but I'm hopeful
IOStreams is accepted. I'd like to work together with him in this new project.
Hi Carlo, Yes, I have been considering how to expand the iostreams library to handle different i/o models, and I believe standard library streams and stream buffers won't be the appropriate abstractions in some cases. I'm also intrigued by the idea of trying to achieve interoperability with Hugo Duncan's library. But right now I need a break :-) If the library is accepted, my focus in my spare time will be to revise the library based on review comments. The documentation, in particular, seems to need major work. Maybe in a couple of months I'll be ready to start thinking about generalizations .... Best Regards, Jonathan

On Sun, Sep 12, 2004 at 01:42:25PM -0600, Jonathan Turkanis wrote:
But right now I need a break :-) If the library is accepted, my focus in my spare time will be to revise the library based on review comments. The documentation, in particular, seems to need major work.
I understand that you need to spend a lot of time on IOstreams. There have been a lot of comments - and obviously those need to be processed somehow.
Maybe in a couple of months I'll be ready to start thinking about generalizations ....
Right now I am stuck - I need a demultiplexor library that works on at least windows and linux BADLY. I will have to make a start with this because there is not really anything else I can do (except when say, I'd quit working on my current project and continue with my ECC project that has been put in the freezer a while ago). I hope you can find the time to briefly comment on the major decision points that I will bring up on this list; it would not be efficient if I started to do work on this library in a direction that you disagree with on a fundamental level. -- Carlo Wood <carlo@alinoe.com>

"Carlo Wood" <carlo@alinoe.com> wrote in message:
I hope you can find the time to briefly comment on the major decision points that I will bring up on this list; it would not be efficient if I started to do work on this library in a direction that you disagree with on a fundamental level.
Okay -- but give me a few days :-) Jonathan

On Sun, 12 Sep 2004 22:15:46 +0200, Carlo Wood wrote
Right now I am stuck - I need a demultiplexor library that works on at least windows and linux BADLY. I will have to make a start with this because there is not really anything else I can do (except when say, I'd quit working on my current project and continue with my ECC project that has been put in the freezer a while ago).
Well, if you have time and can take the risk of using pre-beta libs then I would encourage you to help on boost.socket et. al. Honestly, I think there isn't much overlap between boost.socket and IOStreams. In previous discussion, we hashed around the idea of having a streambuf and stream to use with sockets -- it's been so long since I looked at it I'm not sure what if anything got implemented. Also, if it's just multiplexing then there were a few ideas we discussed -- again I don't think there's implementation of all these things yet. http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?BoostSocket/M... If you can't wait or depend on pre-beta software then really you should consider using ACE. There's lots of things about ACE that are ugly, but there are plenty of things to like. It's robust, cross-platform, and used on real projects everywhere -- you can count on it actually working. I still want to see a modern replacement adopted into boost, but you have figure it would be at least a year out -- assuming someone was really working on it :-( Jeff

On Sun, Sep 12, 2004 at 02:31:17PM -0700, Jeff Garland wrote:
would encourage you to help on boost.socket et. al.
Are you the author of boost.socket? I mailed to the email address given on sourceforge (the giallo project) but got no response. I assumed the project was dead.
Honestly, I think there isn't much overlap between boost.socket and IOStreams. In previous discussion, we hashed around the idea of having a streambuf and stream to use with sockets -- it's been so long since I looked at it I'm not sure what if anything got implemented.
I am interested in designing a standalone multiplexor library. It should merely *support* IOStream - it should be possible to use the two together in a seemless way. The multiplexor library itself however is not really related to streambufs imho.
Also, if it's just multiplexing then there were a few ideas we discussed -- again I don't think there's implementation of all these things yet.
http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?BoostSocket/M...
Yes I already read that.
If you can't wait or depend on pre-beta software then really you should consider using ACE. There's lots of things about ACE that are ugly, but there are plenty of things to like. It's robust, cross-platform, and used on real projects everywhere -- you can count on it actually working.
Did you read my previous posts? Please do so. Have a look at point 5. Please reply to point 1 through 6 in that order (and only post in a next thread when you basically agree with the previous thread).
I still want to see a modern replacement adopted into boost, but you have figure it would be at least a year out -- assuming someone was really working on it :-(
I am about 20 times faster than most people (according to my boss) :) On the other hand - I think I'll need 2 months for the code - and another 1 or 2 months for documentation and examples, because I am not familiar with windows (yet) (if it were only unix then 365/20 = 18 days would indeed be more than enough). However - lots and lots and lots of time will be spend on waiting for people to reply to posts - and RE-post the same things over and over because they don't read things very well :p. I am not sure yet if I will have the patience for that :/ - we'll see how that goes. I had the plan to wait with further development till several people had replied the 6 threads that I started - but after 24 hours still no real reply has been posted, and not doing anything for more than a day is not acceptable. I guess I will have to continue without feedback then :(. Unfortunately - additional posts (like a point 7, 8 etc. do make sense without feedback - because they would be 'fuzzy' brainstorm things, lots of feedback back and forward with a very small delay will be necessary to make any progress imho. Perhaps, if you are the author of boost.socket, we should do this in a private mail exchange? -- Carlo Wood <carlo@alinoe.com>

On Mon, 13 Sep 2004 00:13:44 +0200, Carlo Wood wrote
On Sun, Sep 12, 2004 at 02:31:17PM -0700, Jeff Garland wrote:
would encourage you to help on boost.socket et. al.
Are you the author of boost.socket? I mailed to the email address given on sourceforge (the giallo project) but got no response. I assumed the project was dead.
No, but I helped with some concept development and contributed ideas. Hugo Duncan was the lead developer -- he moved it off to giallo in April of this year. I never really understood why....
Honestly, I think there isn't much overlap between boost.socket and IOStreams. In previous discussion, we hashed around the idea of having a streambuf and stream to use with sockets -- it's been so long since I looked at it I'm not sure what if anything got implemented.
I am interested in designing a standalone multiplexor library. It should merely *support* IOStream - it should be possible to use the two together in a seemless way. The multiplexor library itself however is not really related to streambufs imho.
Also, if it's just multiplexing then there were a few ideas we discussed -- again I don't think there's implementation of all these things yet.
http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?BoostSocket/M...
Yes I already read that.
Good. I don't think that the discussion is at it's end there. Boris put together the idea -- obviously the discussion there is very network programming centric and then things kind of sputtered out. So if you are thinking of multiplexing you must have other things in mind, like timers, signals, etc? Anyway, I can see that perhaps there can be a multiplexing core that can have other elements added as they are developed. As for the I/O piece it seems to me that the relationship with the buffer and not the stream. The stream is there for formatting and the buffer is there for data management. To handle async I/O and such it seems to me that the buffer needs to be enhanced -- the stream can be as well, but foundationally the resposibility has to start in the buffer.
Did you read my previous posts? Please do so. Have a look at point 5. Please reply to point 1 through 6 in that order (and only post in a next thread when you basically agree with the previous thread).
Probably not...
I still want to see a modern replacement adopted into boost, but you have figure it would be at least a year out -- assuming someone was really working on it :-(
I am about 20 times faster than most people (according to my boss) :) On the other hand - I think I'll need 2 months for the code - and another 1 or 2 months for documentation and examples, because I am not familiar with windows (yet) (if it were only unix then 365/20 = 18 days would indeed be more than enough).
Yeah, I didn't mean that it would be a year of development. I meant it would be a year b/f it could be in boost with reviews and such...
However - lots and lots and lots of time will be spend on waiting for people to reply to posts - and RE-post the same things over and over because they don't read things very well :p. I am not sure yet if I will have the patience for that :/ - we'll see how that goes. I had the plan to wait with further development till several people had replied the 6 threads that I started - but after 24 hours still no real reply has been posted, and not doing anything for more than a day is not acceptable. I guess I will have to continue without feedback then :(.
You might not get immediate replies -- especially on a weekend. And double especially on Sept 11.
Unfortunately - additional posts (like a point 7, 8 etc. do make sense without feedback - because they would be 'fuzzy' brainstorm things, lots of feedback back and forward with a very small delay will be necessary to make any progress imho. Perhaps, if you are the author of boost.socket, we should do this in a private mail exchange?
I'm willing to discuss it offline. That's often how the most progress is made on new libraries... Jeff

Hi Carlo, --- Carlo Wood <carlo@alinoe.com> wrote:
Right now I am stuck - I need a demultiplexor library that works on at least windows and linux BADLY. I will have to make a start with this because there is not really anything else I can do (except when say, I'd quit working on my current project and continue with my ECC project that has been put in the freezer a while ago).
May I suggest that you check out asio, a library I have developed for network programming using a "modern" C++ style. I have been thinking about similar ideas to the ones you raised in this thread for quite a while now, and as I had a similar need to you, I developed asio. I believe it addresses the issues you raise, or at least provides a basis for doing so. The asio library currently supports both windows and linux, is free for any use, and is also going into production in a commercial system in about a month. You can get asio from: http://tenermerx.com/programming/cpp/asio/asio-0.1.12.tar.gz I have also put the generated documentation online at: http://tenermerx.com/programming/cpp/asio/asio-0.1.12/doc/html/ Feel free to ask if you have any questions. There's too much to say about all the different stuff in asio to fit in this email :) Regards, Chris

On Mon, Sep 13, 2004 at 04:11:40PM +1000, Christopher Kohlhoff wrote:
Hi Carlo,
Hi!
May I suggest that you check out asio, a library I have developed for network programming using a "modern" C++ style.
Thanks! I am going to spend a few days digging in your source code! If this turns out to be what I need that you just killed the boost.multiplexor project of course :p. Otherwise I will be pleased to use as much from your experience (and code) as possible and I hope you will be willing to give us a hand in perfecting the library for use in boost. -- Carlo Wood <carlo@alinoe.com>

Carlo Wood <carlo@alinoe.com> writes:
Each of these Sources will need to be handled by the Operating System; for example if they are sockets.
It isn't obvious to me that all Sources need to be handled by the OS. I can easily imagine sources that are not OS services. Or were you saying something else? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Sun, Sep 12, 2004 at 09:53:49PM -0400, David Abrahams wrote:
Carlo Wood <carlo@alinoe.com> writes:
Each of these Sources will need to be handled by the Operating System; for example if they are sockets.
It isn't obvious to me that all Sources need to be handled by the OS. I can easily imagine sources that are not OS services. Or were you saying something else?
Ok, let me rephrase that. Forget IOStreams and Sources for a moment ;). I think there is a need for a (minimal) event demultiplexor library. Basically something like libevent, that works on most UNIX OS and is build around the best demultiplexing system call available on a given OS, but hiding the different interface used (select, poll, epoll, kqueue and more). As a result of using libevent it is possible to write an application that runs very efficient on both linux (epoll when the kernel is recent enough and select or poll when compiling libevent on an older system) and FreeBSD (kqueue), without that the user is bothered with these differences between those interfaces. What is missing in libevent is support for windows; it depends on the fact that all devices on all UNIX have the same interface: int fd = open_and_or_create_some_device(initialization); ioctl(fd, more_initialization); special_function(fd, more_stuff); And that works on *all* UNIX OS for all devices in a sufficient portable way (although you still need #ifdef-ed code for certain things like setting filedescriptors to non-blocking (comes in basically two flavours, SYSV and BSD). The only thing that libevent therefore needs to support is the actual demultiplexing of events on 'fd' (an 'int' for EVERY device) and a timeout with at least milisecond resolution. This interface is obviously not portable enough for boost. Even if we only consider adding support for windows a major change of the available interface is needed. In order to provide a portable interface that includes non-UNIX OS it is necessary to include the concept of certain common Device classes (ie, files, sockets, pipes) and provide an abstraction for them as well. Ok, back to IOStreams. Now note that many of those devices can serve as a source or sink of a stream of data. This indicates that they will both, be a notion within this multiplexor library but also in the IOStream library (as either a possible Source or Sink). That doesn't mean that EVERY Source or Sink is an OS handled device. But it does mean that we need to keep in mind that users will likely want to use 'Devices' that can be used with both, boost.multiplexor AND boost.iostreams in an easy and intuitive way. -- Carlo Wood <carlo@alinoe.com>
participants (10)
-
Aaron W. LaFramboise
-
Carlo Wood
-
Christopher Kohlhoff
-
Darryl Green
-
David Abrahams
-
Jeff Garland
-
Jonathan Turkanis
-
Mark Blewett
-
Sean Kelly
-
Tony Juricic