
Hi Peter, Peter Petrov wrote:
On Mon, Feb 1, 2010 at 1:26 PM, Glyn Matthews <glyn.matthews@gmail.com>wrote:
On 31 January 2010 12:08, Phil Endecott <spam_from_boost_dev@chezphil.org
wrote: James Mansion wrote:
Phil Endecott wrote:
I have an HTTP request parser using Spirit, if you're interested. It is a bit grotty as I wrote it as my first exercise using Spirit - but it does work. http://svn.chezphil.org/libpbe/trunk/src/parse_http_request.cc.
Out of interest, is the parser suitable to use as a tutorial on how to translate from RFC specs?
You're welcome to use it in that way if you wish. Most of it was translated directly from the BNF in the RFCs.
I would say that this is on-topic as it is an issue that we face in implementing cpp-netlib. Currently, the request parser in the HTTP server is taken from Boost.Asio HTTP example but I'm certain that this can be improved.
Let me chime in, as I've recently developed an Asio-based HTTP server as well.
First, Spirit is unsuitable for the task - it consumes all the input in one pass, and doesn't support the case when the HTTP request arrives in more than one read. The real solution is a state-machine-based parser, just like the one in the Asio HTTP example.
I disagree in general. My parser is primarily an HTTP request _header_ parser, and the headers are normally relatively small. For most requests (i.e. GETs) the request body doesn't add much, and in those cases it is likely that the whole request can be got in a single read. In fact browser implementations go to some lengths to make their requests fit in single network packets (about 1500 bytes) for performance reasons, and single network packets will generally be accessible as single reads. I normally use this code in a thread-per-connection environment, but if you wanted to use it in a single-threaded system you would need to modify it to detect incomplete input in the (rare) case when the input was split over multiple packets. In the case of HTTP POST and PUT requests, on the other hand, the body (but not the header) can be large, and parsing it incrementally as it arrives probably is necessary. I noticed a BoostCon paper about a MIME parser (Marshall?) - this would definitely benefit from working incrementally in many applications.
In my case, I used an automatically generated parser from EBNF, via Ragel ( http://www.complang.org/ragel/). The grammar itself I "borrowed" from the sandbox version of Lighttpd, which uses the same approach. Link:
http://redmine.lighttpd.net/projects/lighttpd-sandbox/repository/revisions/m...
Ragel is the best solution I'm aware of, and it's easy to integrate its output into Boost-style C++ code. I've not yet benchmarked my solution against the Asio HTTP example parser for performance, but I assume they are close.
This is interesting, and I'll have a look at it next time I need to do some BNF-like parsing. Regards, Phil.