
----- Original Message ----- From: "Eric Niebler" <eric@boost-consulting.com> To: <boost@lists.boost.org> Sent: Thursday, September 15, 2005 3:05 PM Subject: Re: [boost] [Review] xpressive
Answers inline...
Thanks.
1. What is the benefit of providing the complete match in the first entry of the results? e.g. "what[0]". While this is consistent with a long tradition in RE, after some time with STL it's presence at position zero wasnt as comfortable as I expected.
I'm curious, what did your experience with STL lead you to expect?
I did it this way because TR1 regex does it that way. Although xpressive is not a fully compliant TR1 regex implementation, minimizing gratuitous differences can only help.
Yep, agreed. Going back to the "what[0] / STL" thing and starting with your (snipped) example; std::string hello( "hello world!" ); sregex rex = sregex::compile( "(\\w+) (\\w+)!" ); smatch what; if( regex_match( hello, what, rex ) ) { std::cout << what[0] << '\n'; // whole match std::cout << what[1] << '\n'; // first capture std::cout << what[2] << '\n'; // second capture } "What[ 0 ]" is the odd one out; it does not have an implicit mapping to a manifest sub-expression. To RE-philes (I think my first exposure to $0 was in "vi"?) it's de rigueur. To those C++ developers that were born more recently but are familiar with STL, it's a wrinkle. Does processing of "what" always involve "++what.begin()" only because "what.complete()" fails to compete with tradition. Please don't take my quoted code snippets literally. Or imagine I side with the next generation :-)
2. Why the slash syntax in dynamic regex? The resulting requirement for a double is fairly ugly. It may be consistent with something (Perl/ECMA/..?) but on balance is it worth it?
I'm following the lead of every other regex package for C and C++ out
there. Anything else, and there would be riots in the streets. I agree that the double-slashes are hard on the eyes, though. (So use static > regexes insted. :-)
Ha, cool.
3. Why ">>" and not "," (comma). Did the "set" facillity take priority or does the low precedence of comma just result in a different ugliness (sorry, not really the word I want to use :-).
As Joel already said, operator precedence. Also, I completely stole Spirit's choice of operators, lock, stock and barrel. That's a conscious decision (made after much debate and hand-wringing) to ease any future unification, and so that Spirit users can be productive with xpressive with a minimum of fuss.
5. There didnt appear to be much specific thought given to file
Yep, sorry to have missed the evolution of Spirit. I'm a fairly recent Booster that only bothered to search my archives for xpressive before writing the review. Seems kinda dumb now; hope to do better next time. processing.
Is this another "not yet implemented"? In particular elegant integration with any async I/O facillity arising from sockets and file I/O initiatives.
xpressive works generically with iterators. Spirit has a file iterator. That would be the way to go, IMO.
For "normal" file processing this is fine. Well actually its marvelous. But for another circumstance see below.
6. Very interested in the future of "semantic actions". Actions and file processing probably go together?
They're orthogonal, AFAICT.
Yes they are. But I need to be clearer. I was associating files of input with semantic actions because processing of a file with xpr has a good chance of involving a complex xpr. And getting the right code to run at the right time with such an xpr, without embedded actions, involves contortions (even unnecessary CPU cycles?). I'm sure you are fully aware of all this. Sorry, it was an idle association. Also, a recurring problem with related tools such as lex, flex, yacc and bison is that they are architected to be "superior" to the "sub-ordinate" input/buffering scheme. On one hand, this is great because in a traditional parser it hid a significant sub-system and often did an efficient job of it. OTOH it is often difficult/impossible to present data blocks to such a parser in an async fashion. A role reversal is required. Borrowing your example again; // Sometime before establishing a TCP connection sregex rex = sregex::compile( "(\\w+) (\\w+)\\n" ); // Two words per line smatch what; // On an FD_READ // Load available bytes into char buffer[] and; while( regex_accumulate( buffer, what, rex ) ) { // The pattern has been matched. // This loop body may be entered 0 // or more times, for each FD_READ string command = what[ 1 ]; string argument = what[ 2 ]; } Structured this way, the application processing the commands is completely impervious to changing MTUs and block sizes. But something needs to carry the xpressive state between invocations of "regex_accumulate"? Hell, would the xpr lib work as is!? Cheers.