
Scott Woods wrote:
1. The difference (in terms of CPU time) in maintaining a counter and inspecting a "current byte" and testing it for "end of message" seems minimal. This is stated relatively, i.e. it is far more significant that the bytes sent across the network are being scanned at the receiver more than once. Even maintaining the body counter is a (very low cost...) scan. 2. An approach using lex+parse techniques accepts raw byte blocks as input (convenient) and notifies the user through some kind of accept/reduce return code, that the message is complete and already "broken apart", i.e. no further scanning required by higher layers. 3. Lex+parse techniques do not care about block lengths. An accept state or parser reduction can occur anywhere. All the "unget" contortions recently mentioned are not needed. Partial messages are retained in the parser stack and only finally assembled on accept/reduce. This property is something much easier to live with than any kind of "fixed-size" approach that I have dealt with so far.
This is the kind of application of a network library I'm most intrigued by. I've experimented with an aproximation of this approach by modifying a sinister buffering scheme in a C# application by apparently inefficient calls to the equivalents of send and receive to get only one byte at a time and implement a simple lexer; I expected terrible losses but experienced very little of those. Later reapplying a buffering layer at only two particular points made the difference very difficult to measure.
First. We, unfortunately, can't pass std::vector to the operating system, so, at some point, we are allocating fixed sized buffers, and passing it to our IO primitives. There is no escape.
Errrrr. Not quite following that. Are you saying that
send( socket_descriptor, &vector_buffer[ 0 ], vector_buffer.size() )
is bad?
No. What I meant was, the operating system won't resize std::vector for you. It expects a fixed-size amount of memory. Because of this, every "dynamically safe buffering" must be a layer over a "fixed size error-prone" buffering done somewhere. That is a constraint of our primitives. The intention of a streambuf implementation is precisely to conceal such a fixed size buffering, offering the most generic interface to what now becomes a concealed "sequence" (as the documentation I have at hand would call it).
Yes you make some very good points. The product I am currently working on is a vipers' nest of the protocols you talk about and more. There have been some unpleasant suggested uses for protocols such as IMAP4. Trying to build a generic network messaging library that facillitates clear concise application protocols *and* can cope with the likes of IMAP4 is, I believe, unrealistic.
The skeleton of a "protocol message" as I've been working is more or less: //---------- class protocol_message; { public: void clear () { /* Clear data members. */ } template <typename IteratorT> parse_info<IteratorT> parse (IteratorT begin, IteratorT end); // Defined later. }; template <typename CharT, typename TraitsT> basic_ostream<CharT, TraitsT>& operator<< (basic_ostream<CharT, TraitsT>& o, protocol_message const& m); // However is a message in the net... template <typename CharT, typename TraitsT> basic_istream<CharT, TraitsT>& operator<< (basic_istream<CharT, TraitsT>& i, protocol_message& m) { using namespace boost::spirit; // Here we use the Magic Glue typedef multi_pass<std::istreambuf_iterator<char> > iterator_t; iterator_t begin(i); iterator_t end = make_multi_pass(std::istreambuf_iterator<char>()); parse_info<iterator_t> info = m.parse(begin, end); if (!info.hit) i.setstate(std::ios_base::failbit); return i; } namespace detail { class grammar : public boost::spirit::grammar<grammar> { public: grammar (protocol_message& m) : _M_m(m) {} template <typename ScannerT> class definition; // We'll write in _M_ but the constructor takes a const reference. private: protocol_message mutable& _M_; } } template <typename IteratorT> boost::spirit::parse_info<IteratorT> message::parse (IteratorT begin, IteratorT end) { using namespace boost::spirit; this->clear(); detail::grammar g(*this); return boost::spirit::parse(begin, end, g); } //--------- Note how operator>> sets failbit in case of an unsuccessful parse: it allows us to write: iostream stream; protocol_message message; while (stream >> message) { // Work. } // Parsing failed or other error; try to recover? No exception is thrown. But an exception could be thrown; iostream can be configured to do that, and throw an ios_base::failure. The current implementation of the irc_client example distributed in the package I uploaded to the Sandbox is in this URI: https://mndfck.org/svn/socketstream/branches/boost/libs/network/example/irc_... This version has a Spirit grammar for a (modified) version of the IRC grammar as defined in 2812. It's still rough in the edges, but much better than used to be. IRC is a very uninsteresting application, but it's an interesting protocol to experiment with as there is no guarantee when a message is coming from where. "Synchronized" protocols like SMTP are much easier; client sends, server responds, and that's pretty much it. I'm very interested in these kinds of applications of a "netbuf" and the implementation of reusable "protocol message" classes for common protocols; I'm probably going after HTTP next, and try to write a simplified wget. There was also a concern earlier in this thread about excessive buffering in streambuf's with "fixed-sized message" protocols I'd like to address with an example. -- Pedro LamarĂ£o Desenvolvimento Intersix Technologies S.A. SP: (55 11 3803-9300) RJ: (55 21 3852-3240) www.intersix.com.br Your Security is our Business