Re: [network] An RFC posted to the sandbox

Hi Caleb, On 4/12/05, Don G wrote:
I've posted a zip with an HTML file and an HPP file that describes an abstract way to interact with the network.
Overall it looks pretty nice.
Thanks. :) And thanks for taking the time to look at the proposal.
A few observations: * port_t should be unsigned short, not unsigned int
That is, of course, true of TCP/IP :)
* timeout constructors are inconsistent. One takes seconds + microseconds and one takes milliseconds, at least according to the names of the arguments.
Yeah, that was one of the adjustments I tried to make as I boost-ified the code. The Windows way is millisecs, but Unix is microsecs. To do microsecs, I think we would have to use uint64's. That would be fine by me, but I wasn't sure how widely supported it is. I know that for Solaris (SunPro), gcc, MSVC, CodeWarrior, uint64 is good. Of course, a timespec doesn't use uint64. La de da... :)
* I wouldn't call the type you pass to the network::new_{local,broadcast,loopback,}_address method a "url". It isn't one.
But it is, at least from what the description in RFC 2396. You can pass this: http://user:pswd@www.boost.org:8080/foo?x=42 and it would create an address object given that description. The resolved (physical) address would be something like: http://user:pswd@66.35.250.210:8080/foo?x=42
Perhaps "address_specifier" would be better? I can imagine some sort of "stream factory" class with pluggable protocols (e.g. http, https, ftp, etc) that would take fully qualified URLs to create new streams, but in this case you're dealing with at most a hostname and port.
Well, the semantics of such things quickly become non-streams. The intention of the scheme ("http") is to establish a port by means of name lookup. In the end, yes the URL is resolved to address/port. If an HTTP library was built on top, it would do no transformation before passing its URL to new_address(). Hence, the type "url".
I'd contend that in general, an operating system supplied multiplexing facility will scale better than one that uses a thread to handle each connection.
Absolutely. The code I have in mind uses a thread pool and connections are managed by as few a number of threads as necessary. On Windows, each I/O thread can handle 64-ish connections. On Unix, it is much higher. By default, I create one I/O thread per processor and divvy up the connections.
Without some sort of multiplexing facility, how do you know when a channel is ready for I/O? It seems that your proposal is to use a pool of threads to handle async/non-blocking operations, but I don't see any interfaces defined to control or manage these operations. Is that just TBD?
I have contemplated just how a user would want to interact with the thread pool behind the scenes, but, it is ideally an issue of minimal concern. Of course, for advanced users it could be something that needs to be tweaked. In that sense, there is a TBD aspect. I just have yet to see a good abstraction/mechanism for doing this.
Clearly Windows select and WFMO are not as scalable as UNIX select/poll/etc,
In one sense WFMO could be better (it would not be constantly repopulating fd_sets), but select or IOCP (I/O Completion Ports) are it AFAIK. And IOCP is NT only (for those who care).
but I'd contend that if one is writing a serious networking application for Windows, you are going to make use of *both* of these approaches (>1 instance of WFMO/select and thread pooling). In some cases, you may want to do this on UNIX too.
If one could do everything async, there should be no reason for another thread pool. That would become important only if some connection(s) had to do work that would prevent other connections from progressing fairly (for example, by making blocking calls).
I think the point I am trying to make is that there isn't necessarily one right answer. Pools of threads are good for some things, and I/O multiplexing facilities are good for some things. And in some cases, taken both together they are a good thing as well.
The threads in this pool are only doing I/O multiplexing, so I'm not sure I understand your concern. The kind of thread pool in my implementation is just to have roughly the minimum number of threads compared to connections based on multiplexing limits. Another kind of thread pool would be one/two per connection to handle blocking activity. The blocking methods in the proposal could be used when that was the right answer. Or did I miss your point here? For best possible performance, one would want everything to be async and handled directly in the I/O thread. That would eliminate undesirable context switches. Since a given connection has all its callbacks made from one thread (though that thread can change over time), it wouldn't even need a lock as long as side effects were confined. I guess what I should do is write lots more in the HTML file :) There are many areas left unclear that probably lead to confusion. Again, thanks for the very thoughtful response! Best regards, Don __________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ __________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/

Don G <dongryphon <at> yahoo.com> writes:
* timeout constructors are inconsistent. One takes seconds + microseconds and one takes milliseconds, at least according to the names of the arguments.
Yeah, that was one of the adjustments I tried to make as I boost-ified the code. The Windows way is millisecs, but Unix is microsecs. To do microsecs, I think we would have to use uint64's. That would be fine by me, but I wasn't sure how widely supported it is. I know that for Solaris (SunPro), gcc, MSVC, CodeWarrior, uint64 is good. Of course, a timespec doesn't use uint64. La de da... :)
Double is a pretty portable type ;-) How about using a double value that represents seconds? Converting from double t to int milliseconds is just static_cast<int>(t * 1000), etc. Bob

On 4/13/05, Don G <dongryphon@yahoo.com> wrote:
On 4/13/05, Caleb wrote:
Overall it looks pretty nice.
Thanks. :) And thanks for taking the time to look at the proposal.
A few observations: * port_t should be unsigned short, not unsigned int
That is, of course, true of TCP/IP :)
Hm, hadn't thought of non-IP. What protocols out there which support a port/channel concept use a larger port range than TCP/IP? Just curious.
* timeout constructors are inconsistent. One takes seconds + microseconds and one takes milliseconds, at least according to the names of the arguments.
Yeah, that was one of the adjustments I tried to make as I boost-ified the code. The Windows way is millisecs, but Unix is microsecs. To do microsecs, I think we would have to use uint64's. That would be fine by me, but I wasn't sure how widely supported it is. I know that for Solaris (SunPro), gcc, MSVC, CodeWarrior, uint64 is good. Of course, a timespec doesn't use uint64. La de da... :)
What about Bob Bell's suggestion of using double? Would there be concerns about the overhead in all of the double -> integer conversions necessary to interface with the various OS-level APIs? Having constructors from a few possible argument types (e.g. double, int seconds + int <subseconds type TBD>) seems to make sense. As far as the appropriate subseconds type goes, we should probably pick the highest-possible resolution that makes sense, which I'd contend is probably microseconds. Some operating systems may be able to slice time (and signal events) at resolutions below milliseconds, but I doubt any can go deeper than microseconds.
Perhaps "address_specifier" would be better? I can imagine some sort of "stream factory" class with pluggable protocols (e.g. http, https, ftp, etc) that would take fully qualified URLs to create new streams, but in this case you're dealing with at most a hostname and port.
Well, the semantics of such things quickly become non-streams. The intention of the scheme ("http") is to establish a port by means of name lookup. In the end, yes the URL is resolved to address/port. If an HTTP library was built on top, it would do no transformation before passing its URL to new_address(). Hence, the type "url".
I think I see your point here, but I'm not 100% convinced that this is the right place to put url. It seems to me that the concept of URL belongs at a higher level, since the only portion you can use at the network level is a hostname and port. It might be expedient for a high-level HTTP library to be able to pass URLs all the way down to the network level, but it would be nearly as easy for that library to make use of its own URL object which had methods for extracting the "address_specifier" information. Does anyone else have an opinion on this?
I'd contend that in general, an operating system supplied multiplexing facility will scale better than one that uses a thread to handle each connection.
Absolutely. The code I have in mind uses a thread pool and connections are managed by as few a number of threads as necessary. On Windows, each I/O thread can handle 64-ish connections. On Unix, it is much higher. By default, I create one I/O thread per processor and divvy up the connections.
Without some sort of multiplexing facility, how do you know when a channel is ready for I/O? It seems that your proposal is to use a pool of threads to handle async/non-blocking operations, but I don't see any interfaces defined to control or manage these operations. Is that just TBD?
I have contemplated just how a user would want to interact with the thread pool behind the scenes, but, it is ideally an issue of minimal concern. Of course, for advanced users it could be something that needs to be tweaked. In that sense, there is a TBD aspect. I just have yet to see a good abstraction/mechanism for doing this.
OK, having read a number of your posts on this subject, I think I am starting to understand your position w/r/t event dispatch. Correct me if I'm wrong, but your contention is that these mechanisms are best left hidden from the user, and the interface should expose only synchronous operations or async/non-blocking operations with some sort of callback mechanism. How those are implemented is not exposed. This is a bit of a paradigm shift for someone (e.g. me) who is comfortable with select and fd_sets, etc, and even some of the higher level abstractions like ACE_Reactor. But I can see the value in this hiding approach and might warm to it if the implementation is easy to use and performs well.
I think the point I am trying to make is that there isn't necessarily one right answer. Pools of threads are good for some things, and I/O multiplexing facilities are good for some things. And in some cases, taken both together they are a good thing as well.
The threads in this pool are only doing I/O multiplexing, so I'm not sure I understand your concern. The kind of thread pool in my implementation is just to have roughly the minimum number of threads compared to connections based on multiplexing limits. Another kind of thread pool would be one/two per connection to handle blocking activity. The blocking methods in the proposal could be used when that was the right answer. Or did I miss your point here?
The point was several paragraphs of navel-gazing about how the underlying multiplexing implementation might work. Clearly you understand the mechanisms involved, so my dialectic was wasted :) Anyway, yours is a new approach to me, but I think I can see its value and would be interested in trying it out. -- Caleb Epstein caleb dot epstein at gmail dot com

Caleb Epstein wrote:
[...] I think I see your point here, but I'm not 100% convinced that this is the right place to put url. It seems to me that the concept of URL belongs at a higher level, since the only portion you can use at the network level is a hostname and port. It might be expedient for a high-level HTTP library to be able to pass URLs all the way down to the network level, but it would be nearly as easy for that library to make use of its own URL object which had methods for extracting the "address_specifier" information.
Does anyone else have an opinion on this?
I agree with you that URLs and alike belong to a higher level. Level 0 will ultimately depend on C APIs like Berkeley sockets. Concepts like URLs will be implemented in a higher level based on whatever we have in level 0. I put the packages in http://www.highscore.de/boost/net/packages.png which were discussed before on this list. However there are packages missing because I don't know how to define them. There are various ideas like the concept of URLs but I don't want to put everything in a package level 1 because we have different goals here, too. If we assume that there will be a package called boost::net::iostream which provides synchronous I/O operations for sockets on a higher level I don't want to put URL into this package. So what is this package about where URL belongs to? This would help to get a more complete picture about all the dependencies of a network library. Boris
[...]

Caleb Epstein <caleb.epstein <at> gmail.com> writes:
As far as the appropriate subseconds type goes, we should probably pick the highest-possible resolution that makes sense, which I'd contend is probably microseconds. Some operating systems may be able to slice time (and signal events) at resolutions below milliseconds, but I doubt any can go deeper than microseconds.
I wouldn't take that bet. I know Mac OS X can measure time as finely as nanoseconds (but I have no idea how many services, i.e. sockets, actually work at nanosecond resolutions; it doesn't seem outside the realm of possibility that, given the way technologies advance, that within a few short years, microseconds simply won't be fine enough. One of the nice things about double-as-time-unit is that it avoids resolution issues altogether. Bob

Hi Caleb,
Hm, hadn't thought of non-IP. What protocols out there which support a port/channel concept use a larger port range than TCP/IP? Just curious.
When I was writing the HTTP/reflection system, we (at my work) noodled around with a Mac-like 4CHARS concept (L'HTTP' or L'PTTH' for the little-endians<G>) for ports.
* timeout constructors are inconsistent. One takes
What about Bob Bell's suggestion of using double? Would there be concerns about the overhead in all of the double -> integer conversions necessary to interface with the various OS-level APIs? Having constructors from a few possible argument types (e.g. double, int seconds + int <subseconds type TBD>) seems to make sense.
I like the multiple ctors approach. Is this roughly what you have in mind? timeout (uint_t seconds, uint_t micros); timeout (uint64_t microsecs); timeout (double seconds);
As far as the appropriate subseconds type goes, we should probably pick the highest-possible resolution that makes sense, which I'd contend is probably microseconds. Some operating systems may be able to slice time (and signal events) at resolutions below milliseconds, but I doubt any can go deeper than microseconds.
I agree. I can't imagine ever specifying any timeout < 1us. :)
I think I see your point here, but I'm not 100% convinced that this is the right place to put url. It seems to me that the concept of URL belongs at a higher level, since the only portion you can use at the network level is a hostname and port. It might be expedient for a high-level HTTP library to be able to pass URLs all the way down to the network level, but it would be nearly as easy for that library to make use of its own URL object which had methods for extracting the "address_specifier" information.
I guess that I felt the world only needed one way to describe an address in textual form. Different consumers want different pieces, but they can all "just get along"; the URL is plenty accommodating.<g> Also, there is a "hidden" aspect to this. For example, "http://www.boost.org" is as valid to new_address as "www.boost.org:80". The "http" scheme provides indirection back to the port mapping table and, therefore, configuration by the app. Now, "http" might not be the best example, but I've used this technique for homegrown protocols. This allowed me to have an indirection between config file URL values and actual port number.
Does anyone else have an opinion on this?
Anyone? :)
I'd contend that in general, an operating system supplied multiplexing facility will scale better than one that uses a thread to handle each connection.
Absolutely. The code I have in mind uses a thread pool and connections are managed by as few a number of threads as necessary.
OK, having read a number of your posts on this subject, I think I am starting to understand your position w/r/t event dispatch. Correct me if I'm wrong, but your contention is that these mechanisms are best left hidden from the user, and the interface should expose only synchronous operations or async/non-blocking operations with some sort of callback mechanism. How those are implemented is not exposed.
Exactly - at least within the limits of the English language vs. C++<g>.
This is a bit of a paradigm shift for someone (e.g. me) who is comfortable with select and fd_sets, etc, and even some of the higher level abstractions like ACE_Reactor. But I can see the value in this hiding approach and might warm to it if the implementation is easy to use and performs well.
I can say that I have found it very easy to use. :) Performance is always a matter of what measure one wants to use (or rather, what test case). I have stressed it well beyond the needs of my own applications (see previous posts), and found that it performed acceptably. I would like to see how well it performs for other folks. If performance issues turn up, I am confident that a great deal of optimization can be applied with little or no impact on the interfaces (the benefit of encapsulation, right? ;).
The threads in this pool are only doing I/O multiplexing, so I'm not sure I understand your concern.
The point was several paragraphs of navel-gazing about how the underlying multiplexing implementation might work. Clearly you understand the mechanisms involved, so my dialectic was wasted :)
Anyway, yours is a new approach to me, but I think I can see its value and would be interested in trying it out.
I am proceeding along fleshing out the boost-ified interfaces, but the sandbox is not so friendly a place to post constantly changing files. Is there an FTP-like repository or public CVS/whatever? Perhaps I need to create a folder in the sandbox, so I could delete obsolete files. I'm not sure the best answer here... Best, Don __________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/

Don G <dongryphon <at> yahoo.com> writes:
* timeout constructors are inconsistent. One takes
What about Bob Bell's suggestion of using double? Would there be concerns about the overhead in all of the double -> integer conversions necessary to interface with the various OS-level APIs? Having constructors from a few possible argument types (e.g. double, int seconds + int <subseconds type TBD>) seems to make sense.
I like the multiple ctors approach. Is this roughly what you have in mind?
timeout (uint_t seconds, uint_t micros); timeout (uint64_t microsecs); timeout (double seconds);
I don't have any strong opinions on the network library design, but since I'm sort of involved due to my suggestion about using doubles, I want to point out that a concern I have with multiple constructors is potential for ambiguity: timeout t(100); If there is an automatic conversion from int to unit64_t (whatever that is), this would be ambiguous. Further, there's an ambiguity to the reader: what is the intention of this definition: 100 seconds or 100 microseconds? With a single constructor, both issues go away. Bob
participants (4)
-
Bob Bell
-
Boris
-
Caleb Epstein
-
Don G