Re: [boost] C++ Networking Library Release 0.5

1 Feb 2010

      Hi Peter,

Peter Petrov wrote:
...
On Mon, Feb 1, 2010 at 1:26 PM, Glyn Matthews <glyn.matthews@gmail.com>wrote:
...
On 31 January 2010 12:08, Phil Endecott <spam_from_boost_dev@chezphil.org
...
wrote:
James Mansion wrote:
...
Phil Endecott wrote:
...
I have an HTTP request parser using Spirit, if you're interested.  It is
a bit grotty as I wrote it as my first exercise using Spirit - but it does
work.  http://svn.chezphil.org/libpbe/trunk/src/parse_http_request.cc.
Out of interest, is the parser suitable to use as a tutorial on how to
translate from RFC specs?
You're welcome to use it in that way if you wish. Most of it was translated
directly from the BNF in the RFCs.
I would say that this is on-topic as it is an issue that we face in
implementing cpp-netlib.  Currently, the request parser in the HTTP server
is taken from Boost.Asio HTTP example but I'm certain that this can be
improved.
Let me chime in, as I've recently developed an Asio-based HTTP server as
well.
First, Spirit is unsuitable for the task - it consumes all the input in one
pass, and doesn't support the case when the HTTP request arrives in more
than one read. The real solution is a state-machine-based parser, just like
the one in the Asio HTTP example.
I disagree in general.  My parser is primarily an HTTP request _header_ 
parser, and the headers are normally relatively small.  For most 
requests (i.e. GETs) the request body doesn't add much, and in those 
cases it is likely that the whole request can be got in a single read.  
In fact browser implementations go to some lengths to make their 
requests fit in single network packets (about 1500 bytes) for 
performance reasons, and single network packets will generally be 
accessible as single reads.

I normally use this code in a thread-per-connection environment, but if 
you wanted to use it in a single-threaded system you would need to 
modify it to detect incomplete input in the (rare) case when the input 
was split over multiple packets.

In the case of HTTP POST and PUT requests, on the other hand, the body 
(but not the header) can be large, and parsing it incrementally as it 
arrives probably is necessary.  I noticed a BoostCon paper about a MIME 
parser (Marshall?) - this would definitely benefit from working 
incrementally in many applications.
...
In my case, I used an automatically generated parser from EBNF, via Ragel (
http://www.complang.org/ragel/). The grammar itself I "borrowed" from the
sandbox version of Lighttpd, which uses the same approach. Link:
http://redmine.lighttpd.net/projects/lighttpd-sandbox/repository/revisions/m...
Ragel is the best solution I'm aware of, and it's easy to integrate its
output into Boost-style C++ code. I've not yet benchmarked my solution
against the Asio HTTP example parser for performance, but I assume they are
close.
This is interesting, and I'll have a look at it next time I need to do 
some BNF-like parsing.

Regards,  Phil.