
I am currently using in a project a custom-writen recursive-decent parser, which uses an filtering_istream so I can support compressed input. The only functionality used is peekchar, getchar and integer extraction with 'stream >> i'. I am finding that C++ streams, particularly when it comes to integer extraction, are giving terrible performance. In a small test, uncompressing to a buffer and then writing a simple custom reader on that memory buffer gives 10x performance. I would perfer not to write and support such a tool if possible. Is there a boost library which easily supports a parser with 1 character lookahead and simple things like integer extraction? Looking at the documentation, Boost::Spirit seems like a very big hammer to crack this quite small nut, and it is unclear to me how well it would fit into an existing recursive decent parser. Has anyone ever used it as such? Is there a simple alternative? Chris

I am currently using in a project a custom-writen recursive-decent parser, which uses an filtering_istream so I can support compressed input.
The only functionality used is peekchar, getchar and integer extraction with 'stream >> i'.
I am finding that C++ streams, particularly when it comes to integer extraction, are giving terrible performance. In a small test, uncompressing to a buffer and then writing a simple custom reader on that memory buffer gives 10x performance.
I would perfer not to write and support such a tool if possible. Is there a boost library which easily supports a parser with 1 character lookahead and simple things like integer extraction?
Looking at the documentation, Boost::Spirit seems like a very big hammer to crack this quite small nut, and it is unclear to me how well it would fit into an existing recursive decent parser. Has anyone ever used it as such? Is there a simple alternative?
Spirit (which is recursive decent) is definitely your friend. Especially the new version (V2.1), which is to be released with Boost V1.41, but in SVN trunk now and quite mature already. Parsing an integer from a buffer of characters is as easy as: std::string buffer("1234"); int value = 0; using namespace boost::spirit; bool r = qi::parse(buffer.begin(), buffer.end(), int_, value); assert(r && value == 1234); And BTW: measurements show that the Spirit int_ parser is faster than atoi (gcc/vc using their respective crtlib)! Regards Hartmut

Christopher Jefferson wrote:
Looking at the documentation, Boost::Spirit seems like a very big hammer to crack this quite small nut, and it is unclear to me how well it would fit into an existing recursive decent parser. Has anyone ever used it as such? Is there a simple alternative?
Spirit is well tuned for small parsing tasks like this. It is a modular RD parser. What you need is what you pay for. The code is as tight as it can be. Try the int parsing examples and see the generated assembler. One of Spirit's original goal is for such micro-parsing. You don't have to write an RD parser, Spirit is an RD parser. However, if you need to use an existing RD parser, then good! You can use any or all of Spirit facilities as-is by calling Spirit's parse functions (which accept forward iterators) from your RD environment. Or, the other way around, you can write a custom-parser in spirit that calls your RD parser. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

At 05:49 PM 6/21/2009, you wrote:
Christopher Jefferson wrote:
Looking at the documentation, Boost::Spirit seems like a very big hammer to crack this quite small nut, and it is unclear to me how well it would fit into an existing recursive decent parser. Has anyone ever used it as such? Is there a simple alternative?
Spirit is well tuned for small parsing tasks like this. It is a modular RD parser. What you need is what you pay for. The code is as tight as it can be.
I don't think this solves his problem. Note that he got a 10X speed up by changing to a buffer with his existing parser, so the _parsing_ code isn't the bottle neck. It's better I/O that's needed. I suspect it's the locking done by streams on each operation, so he's basically doing a lock on every character. I think he'd be better off using the forward iterator idea and then either writing a small wrapper class on streams to block read or use the lower level read buffer interface.

Alan M. Carroll wrote:
At 05:49 PM 6/21/2009, you wrote:
Christopher Jefferson wrote:
Looking at the documentation, Boost::Spirit seems like a very big hammer to crack this quite small nut, and it is unclear to me how well it would fit into an existing recursive decent parser. Has anyone ever used it as such? Is there a simple alternative? Spirit is well tuned for small parsing tasks like this. It is a modular RD parser. What you need is what you pay for. The code is as tight as it can be.
I don't think this solves his problem.
What is the problem? Speed, right? Then I don't see why it does not solve the problem.
Note that he got a 10X speed up by changing to a buffer with his existing parser, so the _parsing_ code isn't the bottle neck. It's better I/O that's needed. I suspect
And Spirit uses a better I/O through generic Forward Iterators.
it's the locking done by streams on each operation, so he's basically doing a lock on every character. I think he'd be better off using the forward iterator idea and then either writing a small wrapper class on streams to block read or use the lower level read buffer interface.
I think you are under-estimating the complexity of writing a humble number parser from a Forward Iterator. It's not as simple as it looks and gets pretty hairy when you get past the toy examples. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

On 22 Jun 2009, at 16:11, Joel de Guzman wrote:
Alan M. Carroll wrote:
At 05:49 PM 6/21/2009, you wrote:
Christopher Jefferson wrote:
Looking at the documentation, Boost::Spirit seems like a very big hammer to crack this quite small nut, and it is unclear to me how well it would fit into an existing recursive decent parser. Has anyone ever used it as such? Is there a simple alternative? Spirit is well tuned for small parsing tasks like this. It is a modular RD parser. What you need is what you pay for. The code is as tight as it can be. I don't think this solves his problem.
What is the problem? Speed, right? Then I don't see why it does not solve the problem.
Note that he got a 10X speed up by changing to a buffer with his existing parser, so the _parsing_ code isn't the bottle neck. It's better I/O that's needed. I suspect
And Spirit uses a better I/O through generic Forward Iterators.
it's the locking done by streams on each operation, so he's basically doing a lock on every character. I think he'd be better off using the forward iterator idea and then either writing a small wrapper class on streams to block read or use the lower level read buffer interface.
I think you are under-estimating the complexity of writing a humble number parser from a Forward Iterator. It's not as simple as it looks and gets pretty hairy when you get past the toy examples.
Yes, this is my concern. As a proof of principle I wrote my own number generator, but it's only passing about 30% of our internal tests. This is enough to convince me, possibly falsely, it works "in principle" and could be fixed without too much loss of speed, but I worry how much work that might be. (For those curious, a quick glance suggests the two main problems are not handling negative numbers or recognising overflow. Both easily fixable, but I don't feel necessary for an experiment). Chris
participants (4)
-
Alan M. Carroll
-
Christopher Jefferson
-
Hartmut Kaiser
-
Joel de Guzman