Re: [boost] [Ann] socketstream library 0.7

15 Jun 2005

      Scott Woods wrote:
...
1. The difference (in terms of CPU time) in maintaining a counter
and inspecting a "current byte" and testing it for "end of message"
seems minimal. This is stated relatively, i.e. it is far more significant
that the bytes sent across the network are being scanned at the
receiver more than once. Even maintaining the body counter is
a (very low cost...) scan.
2. An approach using lex+parse techniques accepts raw byte
blocks as input (convenient) and notifies the user through some
kind of accept/reduce return code, that the message is complete
and already "broken apart", i.e. no further scanning required
by higher layers.
3. Lex+parse techniques do not care about block lengths. An
accept state or parser reduction can occur anywhere. All the
"unget" contortions recently mentioned are not needed. Partial
messages are retained in the parser stack and only finally
assembled on accept/reduce. This property is something
much easier to live with than any kind of "fixed-size" approach
that I have dealt with so far.
This is the kind of application of a network library I'm most intrigued by.

I've experimented with an aproximation of this approach by modifying a
sinister buffering scheme in a C# application by apparently inefficient
calls to the equivalents of send and receive to get only one byte at a
time and implement a simple lexer; I expected terrible losses but
experienced very little of those. Later reapplying a buffering layer at
only two particular points made the difference very difficult to measure.
...
...
First. We, unfortunately, can't pass std::vector to the operating
system, so, at some point, we are allocating fixed sized buffers, and
passing it to our IO primitives. There is no escape.
Errrrr. Not quite following that. Are you saying that
send( socket_descriptor, &vector_buffer[ 0 ], vector_buffer.size() )
is bad?
No. What I meant was, the operating system won't resize std::vector for
you. It expects a fixed-size amount of memory.

Because of this, every "dynamically safe buffering" must be a layer over
a "fixed size error-prone" buffering done somewhere. That is a
constraint of our primitives.

The intention of a streambuf implementation is precisely to conceal such
a fixed size buffering, offering the most generic interface to what now
becomes a concealed "sequence" (as the documentation I have at hand
would call it).
...
Yes you make some very good points. The product I am currently working
on is a vipers' nest of the protocols you talk about and more. There have
been some unpleasant suggested uses for protocols such as IMAP4. Trying
to build a generic network messaging library that facillitates clear concise
application protocols *and* can cope with the likes of IMAP4 is, I believe,
unrealistic.
The skeleton of a "protocol message" as I've been working is more or less:

//----------

class protocol_message;
{
public:
    void clear () { /* Clear data members. */ }

    template <typename IteratorT>
    parse_info<IteratorT> parse (IteratorT begin, IteratorT end);
    // Defined later.
};

template <typename CharT, typename TraitsT>
basic_ostream<CharT, TraitsT>&
operator<< (basic_ostream<CharT, TraitsT>& o,
            protocol_message const& m);
// However is a message in the net...

template <typename CharT, typename TraitsT>
basic_istream<CharT, TraitsT>&
operator<< (basic_istream<CharT, TraitsT>& i, protocol_message& m)
{
  using namespace boost::spirit;

  // Here we use the Magic Glue
  typedef multi_pass<std::istreambuf_iterator<char> > iterator_t;
  iterator_t begin(i);
  iterator_t end = make_multi_pass(std::istreambuf_iterator<char>());

  parse_info<iterator_t> info = m.parse(begin, end);
  if (!info.hit)
    i.setstate(std::ios_base::failbit);

  return i;
}

namespace detail
{
  class grammar : public boost::spirit::grammar<grammar>
  {
  public:

    grammar (protocol_message& m) : _M_m(m) {}

    template <typename ScannerT>
    class definition;
    // We'll write in _M_ but the constructor takes a const reference.

  private:
    protocol_message mutable& _M_;
  }
}

template <typename IteratorT>
boost::spirit::parse_info<IteratorT>
message::parse (IteratorT begin, IteratorT end) {
  using namespace boost::spirit;
  this->clear();
  detail::grammar g(*this);
  return boost::spirit::parse(begin, end, g);
}

//---------

Note how operator>> sets failbit in case of an unsuccessful parse: it
allows us to write:

iostream stream;
protocol_message message;

while (stream >> message)
{
  // Work.
}
// Parsing failed or other error; try to recover?

No exception is thrown. But an exception could be thrown; iostream can
be configured to do that, and throw an ios_base::failure.

The current implementation of the irc_client example distributed in the
package I uploaded to the Sandbox is in this URI:

https://mndfck.org/svn/socketstream/branches/boost/libs/network/example/irc_...

This version has a Spirit grammar for a (modified) version of the IRC
grammar as defined in 2812. It's still rough in the edges, but much
better than used to be.

IRC is a very uninsteresting application, but it's an interesting
protocol to experiment with as there is no guarantee when a message is
coming from where. "Synchronized" protocols like SMTP are much easier;
client sends, server responds, and that's pretty much it.

I'm very interested in these kinds of applications of a "netbuf" and the
implementation of reusable "protocol message" classes for common
protocols; I'm probably going after HTTP next, and try to write a
simplified wget.

There was also a concern earlier in this thread about excessive
buffering in streambuf's with "fixed-sized message" protocols I'd like
to address with an example.

-- 
Pedro Lamarão
Desenvolvimento

Intersix Technologies S.A.
SP: (55 11 3803-9300)
RJ: (55 21 3852-3240)
www.intersix.com.br

Your Security is our Business