
Hi Pedro, Still trying to get my Outlook to indent (>) and failing. I have all the proper options set but they are ignored. Go figure. Re-install time. I've inserted my comments with * ----- Original Message ----- From: "Pedro Lamarão" <pedro.lamarao@intersix.com.br>
3. Lex+parse techniques do not care about block lengths. An accept state or parser reduction can occur anywhere. All the "unget" contortions recently mentioned are not needed. Partial messages are retained in the parser stack and only finally assembled on accept/reduce. This property is something much easier to live with than any kind of "fixed-size" approach that I have dealt with so far.
From your code snippets I can see the layering of activity and how
This is the kind of application of a network library I'm most intrigued by. I've experimented with an aproximation of this approach by modifying a sinister buffering scheme in a C# application by apparently inefficient calls to the equivalents of send and receive to get only one byte at a time and implement a simple lexer; I expected terrible losses but experienced very little of those. Later reapplying a buffering layer at only two particular points made the difference very difficult to measure. * Ah yes. Dont know about sinister buffering or C# but think * I follow enough from context. And your observations are consistent * with what I have seen. [snip code] iostream stream; protocol_message message; while (stream >> message) { // Work. } * Very nice. No exception is thrown. But an exception could be thrown; iostream can be configured to do that, and throw an ios_base::failure. The current implementation of the irc_client example distributed in the package I uploaded to the Sandbox is in this URI: https://mndfck.org/svn/socketstream/branches/boost/libs/network/example/irc_ client/message.hpp * I did try to decompress your package with Windows utilities. These failed * with messages about "not bzip2"; can you indicate a specific utility? This version has a Spirit grammar for a (modified) version of the IRC grammar as defined in 2812. It's still rough in the edges, but much better than used to be. IRC is a very uninsteresting application, but it's an interesting protocol to experiment with as there is no guarantee when a message is coming from where. "Synchronized" protocols like SMTP are much easier; client sends, server responds, and that's pretty much it. I'm very interested in these kinds of applications of a "netbuf" and the implementation of reusable "protocol message" classes for common protocols; I'm probably going after HTTP next, and try to write a simplified wget. There was also a concern earlier in this thread about excessive buffering in streambuf's with "fixed-sized message" protocols I'd like to address with an example. ************************************** Nice use of boost. Did you mention this in the "who's using boost" thread? ultimately it is flexible enough to cope with the likes of IRC and (possibly :-) IMAP4. My concern about multi-pass is probably superseded by that exact ability to cope with ugly protocols (in same cases the ugliness is more correctly described as part of the encoding). In previous threads addressing similar issues the suggestion was to use an "envelope" approach; that delivered the same benefits as your low-level header+body. It is a little bit tragic to concede this point for me as I have invested quite heavily in a technology that parses straight from the network block to a variant. The variant is capable of holding a vector of variants as a "value" (yes, a recursive definition). Operator>> is overloaded in such a way that you can code in this manner; struct routable_message { unsigned long to_address; unsigned long from_address; net_variant payload; }; routable_message & operator>>( net_variant &v, routable_message &m ) { vector<net_variant> &a = net_array<3>( v ); // Access the expected tuple a[ 0 ] >> m.to_address; a[ 1 ] >> m.from_address; a[ 2 ] >> m.payload; return m; } At the point where a variant is completed (e.g. part way through a network block), it is presented to a receiver e.g. void message_router::operator()( net_variant &v ) { operator()( v >> routable_message() ); } void message_router::operator()( routable_message &m ) { iterator f = find( m.to_address ); if(f == end()) return; (* f->second )( m.payload, m.from_address ); } Hopefully this is enough to show how elegant the code becomes even when dealing with multiple layers of software, i.e. the message_router has no idea what type conversions are performed by the receiver of the payload. All operator>> implementations are required to use "move" semantics so any data "new'd" by the variant parser is exactly the data that is finally moved into the application type. To summarize; I have been resisting the header+body (or "envelope") technique but it would appear to be more extensible. The separation of "message completion" and "content parsing" allows for more protocol-specific handling that I cannot do as my "parser" runs over the entire message. Again the protocol-specifics that I allude to are often better described as encoding specific as most of the TCP application suite binds an encoding inextricably to each protocol. Dealing with continuations and embedded objects (different encoder states) may still exhaust the extensibility of the envelope approach. There is nothing in the IMAP4 protocol that cannot be represented within something such as my net_variant, i.e. it does not need a protocol-specific encoding. The same for SMTP, HTTP, .... How much simpler it could have been! gracias, Scott