Is there anything wrong with the Gmane newsgroup? + Interested in parsing tools and networking code
Hello, I just joined the mailing list - Greetings to all. I have been trying for several day to post messages to the Boost Users newsgroup on Gmane. I didn't get any error message, but my postings never appeared. Boost is one of the best resources I have found in a very long time - Thanks for the effort! One of my main current interests is parsing. Trying to decide among the choices: - Regex - Spirit - Xpressive I could use some help with ASIO, too. -Ramon
On Sat, Sep 12, 2009 at 12:23 PM, Ramon F Herrera
Hello,
I just joined the mailing list - Greetings to all.
I have been trying for several day to post messages to the Boost Users newsgroup on Gmane. I didn't get any error message, but my postings never appeared.
You are subscribed to the group first and foremost?
On Sat, Sep 12, 2009 at 12:23 PM, Ramon F Herrera
Boost is one of the best resources I have found in a very long time - Thanks for the effort!
One of my main current interests is parsing. Trying to decide among the choices:
- Regex - Spirit - Xpressive
Depends on what you are wanting to parse. If you want to do, say, a
search and replace in a file, Xpressive is best, if you want to parse
data structures and you want the absolute best speed and a completely
unambiguous grammar, Spirit2.1 for sure. Do not bother with Regex
itself as Xpressive can do everything Regex can, but more and better.
On Sat, Sep 12, 2009 at 12:23 PM, Ramon F Herrera
I could use some help with ASIO, too.
Ask away. :)
OvermindDL1 wrote:
One of my main current interests is parsing. Trying to decide among the choices:
- Regex - Spirit - Xpressive
Depends on what you are wanting to parse. If you want to do, say, a search and replace in a file, Xpressive is best, if you want to parse data structures and you want the absolute best speed and a completely unambiguous grammar, Spirit2.1 for sure. Do not bother with Regex itself as Xpressive can do everything Regex can, but more and better.
Thanks so much!, OvermindDL1...
Allow me to describe my target data. I initially had a bunch of files
with lines like this:
Variable Name = Variable Value
These are some examples:
--------------------------------------------------------------------
My Favorite Baseball Player = George Herman "Babe" Ruth
What did you do on Christmas = I rested, computed the % mortgage and
visited my brother + sister.
(the above should be in a single line)
Favorite Curse = That umpire is a #&*%!
--------------------------------------------------------------------
I quickly solved the above parsing with Regex like this:
string variable = "([A-Za-z0-9][\\w\\h\\(\\)\\-\\.,/&]*)";
char equal_sign = '=';
string value = "(.+)";
assignment = variable + equal_sign + value;
After retrieving the LHS and the RHS I store them for subsequent use in
a map
On Sat, Sep 12, 2009 at 1:40 PM, Ramon F Herrera
OvermindDL1 wrote:
One of my main current interests is parsing. Trying to decide among the choices:
- Regex - Spirit - Xpressive
Depends on what you are wanting to parse. If you want to do, say, a search and replace in a file, Xpressive is best, if you want to parse data structures and you want the absolute best speed and a completely unambiguous grammar, Spirit2.1 for sure. Do not bother with Regex itself as Xpressive can do everything Regex can, but more and better.
Thanks so much!, OvermindDL1...
Allow me to describe my target data. I initially had a bunch of files with lines like this:
Variable Name = Variable Value
These are some examples:
-------------------------------------------------------------------- My Favorite Baseball Player = George Herman "Babe" Ruth
What did you do on Christmas = I rested, computed the % mortgage and visited my brother + sister.
(the above should be in a single line)
Favorite Curse = That umpire is a #&*%! --------------------------------------------------------------------
I quickly solved the above parsing with Regex like this:
string variable = "([A-Za-z0-9][\\w\\h\\(\\)\\-\\.,/&]*)"; char equal_sign = '='; string value = "(.+)"; assignment = variable + equal_sign + value;
After retrieving the LHS and the RHS I store them for subsequent use in a map
data structure. My data, however, just became a bit more challenging. It is now divided into blocks:
[Unique ID 1] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value
[Unique ID 2] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value
[Unique ID 3] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value
(etc.)
Again, I would like to store the new format in a map, using the Unique ID as key to retrieve the block of lines underneath each ID.
Actually, that kind of stuff is very easy to do in Spirit2.1 (in the
boost trunk or Boost 1.41), it can auto-fill your structures and
everything, and it is very fast.
On Sat, Sep 12, 2009 at 1:40 PM, Ramon F Herrera
At this stage, I am wondering whether to continue using true and tried (and learned!) Regex, or get my feet wet into more powerful tools, such as the one recommended by Overmind (Xpressive).
As stated, Xpressive can do all Regex can do, but you can also do
static regex's (compiled by the C++ grammar, much faster then a string
regex), but Spirit2.1 would still be a lot faster overall (it has been
timed against a lot of things, and it blows even Xpressive's static
parsers away).
On Sat, Sep 12, 2009 at 1:40 PM, Ramon F Herrera
How does Xpressive compare with ANTLR? I am torn between them.
Xpressive and ANTLR are two different things. ANTLR is like a not-as-powerful-and-slower Spirit2.1, a full grammar parser, where Xpressive is just a regex parser.
This makes me wonder how Xpressive and Spirit compare, both do
compiled parsing statements right?
Can Spirit somehow be seen as Xpressive + more? Why use Xpressive at all then?
Best,
Dee
On Sun, Sep 13, 2009 at 5:59 AM, OvermindDL1
On Sat, Sep 12, 2009 at 1:40 PM, Ramon F Herrera
wrote: OvermindDL1 wrote:
One of my main current interests is parsing. Trying to decide among the choices:
- Regex - Spirit - Xpressive
Depends on what you are wanting to parse. If you want to do, say, a search and replace in a file, Xpressive is best, if you want to parse data structures and you want the absolute best speed and a completely unambiguous grammar, Spirit2.1 for sure. Do not bother with Regex itself as Xpressive can do everything Regex can, but more and better.
Thanks so much!, OvermindDL1...
Allow me to describe my target data. I initially had a bunch of files with lines like this:
Variable Name = Variable Value
These are some examples:
-------------------------------------------------------------------- My Favorite Baseball Player = George Herman "Babe" Ruth
What did you do on Christmas = I rested, computed the % mortgage and visited my brother + sister.
(the above should be in a single line)
Favorite Curse = That umpire is a #&*%! --------------------------------------------------------------------
I quickly solved the above parsing with Regex like this:
string variable = "([A-Za-z0-9][\\w\\h\\(\\)\\-\\.,/&]*)"; char equal_sign = '='; string value = "(.+)"; assignment = variable + equal_sign + value;
After retrieving the LHS and the RHS I store them for subsequent use in a map
data structure. My data, however, just became a bit more challenging. It is now divided into blocks:
[Unique ID 1] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value
[Unique ID 2] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value
[Unique ID 3] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value
(etc.)
Again, I would like to store the new format in a map, using the Unique ID as key to retrieve the block of lines underneath each ID.
Actually, that kind of stuff is very easy to do in Spirit2.1 (in the boost trunk or Boost 1.41), it can auto-fill your structures and everything, and it is very fast.
On Sat, Sep 12, 2009 at 1:40 PM, Ramon F Herrera
wrote: At this stage, I am wondering whether to continue using true and tried (and learned!) Regex, or get my feet wet into more powerful tools, such as the one recommended by Overmind (Xpressive).
As stated, Xpressive can do all Regex can do, but you can also do static regex's (compiled by the C++ grammar, much faster then a string regex), but Spirit2.1 would still be a lot faster overall (it has been timed against a lot of things, and it blows even Xpressive's static parsers away).
On Sat, Sep 12, 2009 at 1:40 PM, Ramon F Herrera
wrote: How does Xpressive compare with ANTLR? I am torn between them.
Xpressive and ANTLR are two different things. ANTLR is like a not-as-powerful-and-slower Spirit2.1, a full grammar parser, where Xpressive is just a regex parser. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Diederick C. Niehorster wrote:
This makes me wonder how Xpressive and Spirit compare, both do compiled parsing statements right?
Can Spirit somehow be seen as Xpressive + more? Why use Xpressive at all then?
Best, Dee
Hi Dee, While I am far from being an expert (and hope to read answers from people more qualified than myself), I was wondering the same thing. I venture a guess. You draw the line here: Several tools (Regex, Xpressive, Perl) can grab data, based on regular expressions. They can validate whether some statement is a correct expression of the target "language". The more advanced tools, however, can actually put that data into action. The passive data becomes a set of executable statements using Spirit, ANTLR, etc. You specify things like: "every time you find the verb "such-and-such" call my function XYZ with the needed parameters. Callback functions are the dividing line. -Ramon
On Sat, Sep 12, 2009 at 7:04 PM, Ramon F Herrera
Diederick C. Niehorster wrote:
This makes me wonder how Xpressive and Spirit compare, both do compiled parsing statements right?
Can Spirit somehow be seen as Xpressive + more? Why use Xpressive at all then?
Best, Dee
Hi Dee,
While I am far from being an expert (and hope to read answers from people more qualified than myself), I was wondering the same thing.
I venture a guess. You draw the line here:
Several tools (Regex, Xpressive, Perl) can grab data, based on regular expressions. They can validate whether some statement is a correct expression of the target "language".
The more advanced tools, however, can actually put that data into action. The passive data becomes a set of executable statements using Spirit, ANTLR, etc. You specify things like: "every time you find the verb "such-and-such" call my function XYZ with the needed parameters.
Callback functions are the dividing line.
Static Xpressive (not dynamic Xpressive, you can think of dynamic
Xpressive as being Boost.Regex exactly) and Spirit are both C++ DSEL's
and compile to rather fast code, but Xpressive is a REGEX parser, and
as such has limitations, where Spirit2.1 is a PEG (Parsing Expression
Grammer as I recall, wiki it) grammar. PEG's are nice in that they
are unambiguous, they are faster, then have unlimited lookahead,
etc... Spirit can also be bound to just about anything in any way in
the C++ world, with built-in parsers for a ton of things (everything
from POD's to the STL to many Boost libraries like Fusion and such),
and it is quite easy to make your own new things as well. For a
comparison of Spirit2.1 with ANTLR, Spirit2.1 has been shown to be
faster in execution speed, the code is a great deal shorter, and
ANTLR's actions pale in comparison to Spirit2.1's versions, plus the
fact you do not need to pre-parse code with an external app like you
have to do with ANTLR.
But yea, to learn what Spirit2.1 is built off of, look up PEG's on
wikipedia, and yet Spirit2.1 is still so much more powerful then that.
The documentation for it is in trunk. But as an example, I wrote
this grammer a free hours ago, it is a relatively nasty looking one
that I really should break up into easier to read parts, but it works
quite well in my testing thus far:
static boost::spirit::qi::rule
(+(uri_decode_rule-char_(";,?/#")))%lit(',')))%lit(';')) // params (ex: ;param=val1,val2,val3;p2) ) % lit('/')) >> // path -(lit('?') >> -((+(uri_decode_rule-char_("&=;#")) >> -(lit('=') +(uri_decode_rule-char_("&;#")))) % omit[char_("&;")] >> omit[-char_("&;")])) >> // query -(lit('#') >> *uri_decode_rule) // fragment )
It parses a URI (a type that only my system will generate, so I am not sure it follows the spec exactly, but it works very well for my purpose, and it is very fast). Only thing it does is parse all the info into this: struct uri { std::string scheme; std::string netloc; typedef std::vector< std::pair< std::string, std::vectorstd::string > > params_t; typedef std::vector< std::pairstd::string,params_t > path_t; path_t path; typedef std::vector< std::pairstd::string,std::string > query_t; query_t query; std::string fragment; };
OvermindDL1 wrote:
Static Xpressive (not dynamic Xpressive, you can think of dynamic Xpressive as being Boost.Regex exactly) and Spirit are both C++ DSEL's and compile to rather fast code, but Xpressive is a REGEX parser, and as such has limitations, where Spirit2.1 is a PEG (Parsing Expression Grammer as I recall, wiki it) grammar.
To be fair, regexes have advantages too, like non-greedy loops, and full backtracking. In general, PEG and Regex have different characteristics, if you add LL, LALR, LR into the mix, then you have more parsing/text processing creatures all with different characteristics. Sometimes the lines blur. For instance, traditionally, regexes cannot handle recursive parsing such as parsing xml/html. Xpressive, however, got beyond that limitation. Also, while Spirit is static (at the moment), boost.regex is dynamic and boost.xpressive can both be static and dynamic. IOTW, you can define your regexes at runtime. So bottom line: use the right tool for the job. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net
OvermindDL1 wrote:
But yea, to learn what Spirit2.1 is built off of, look up PEG's on wikipedia, and yet Spirit2.1 is still so much more powerful then that. The documentation for it is in trunk. But as an example, I wrote this grammer a free hours ago, it is a relatively nasty looking one that I really should break up into easier to read parts, but it works quite well in my testing thus far:
[snip code] It would help if you use fewer indent spaces and using declarations ;-) Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net
On Sat, Sep 12, 2009 at 11:57 PM, Joel de Guzman
OvermindDL1 wrote:
But yea, to learn what Spirit2.1 is built off of, look up PEG's on wikipedia, and yet Spirit2.1 is still so much more powerful then that. The documentation for it is in trunk. But as an example, I wrote this grammer a free hours ago, it is a relatively nasty looking one that I really should break up into easier to read parts, but it works quite well in my testing thus far:
[snip code]
It would help if you use fewer indent spaces and using declarations ;-)
It was in a self contained function in a cpp file, I tend to use using a lot in those cases, where I *never* use using in a header file. :) And the function was a few namespaces deep, causing it to get deep in my IDE. :)
Ramon F Herrera wrote:
OvermindDL1 wrote:
One of my main current interests is parsing. Trying to decide among the choices:
- Regex - Spirit - Xpressive
Depends on what you are wanting to parse. If you want to do, say, a search and replace in a file, Xpressive is best, if you want to parse data structures and you want the absolute best speed and a completely unambiguous grammar, Spirit2.1 for sure. Do not bother with Regex itself as Xpressive can do everything Regex can, but more and better.
Thanks so much!, OvermindDL1...
Allow me to describe my target data. I initially had a bunch of files with lines like this:
[...] By the looks of your description, Spirit2 sounds just like the right tool alright. Check out the tutorial: http://tinyurl.com/ozdsjo Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net
participants (4)
-
Diederick C. Niehorster
-
Joel de Guzman
-
OvermindDL1
-
Ramon F Herrera