xpressive and white space skipping

older
Is there any interest in a casting...

Jorge Lodos Vigil

23 Nov 2007 23 Nov '07

1:10 p.m.

Hi We are using xpressive with a grammar to match certain patterns. In some cases, we have the need to ignore white spaces. Using dynamic regexes, this can be achieved with the ignore_white_space constant. Is there a way to ignore white spaces using a grammar in xpressive other than modifying the grammar itself? We know spirit is an option, but we need to evaluate parsing speed with as many methods as possible. Thanks in advance. Cheers Jorge

Show replies by date

Dave Jenkins

23 Nov 23 Nov

3:47 p.m.

"Jorge Lodos Vigil" <lodos@segurmatica.cu> wrote in message news:ECBF993526E3BC47BD232BDC77D6933706A916F4E9@mercurio.segurmatica.cu...

...

Hi We are using xpressive with a grammar to match certain patterns. In some cases, we have the need to ignore white spaces. Using dynamic regexes, this can be achieved with the ignore_white_space constant. Is there a way to ignore white spaces using a grammar in xpressive other than modifying the grammar itself?

How about using a filter_iterator to skip spaces. Something like the following program: #include <iostream> #include <string> #include <boost/config.hpp> #include <boost/iterator/filter_iterator.hpp> #include <boost/range/iterator_range.hpp> #include <boost/xpressive/xpressive_static.hpp> struct not_space { inline bool operator()(char ch) const { return ' ' != ch; } }; typedef boost::filter_iterator<not_space, std::string::const_iterator> Iter_Skip; boost::iterator_range<Iter_Skip> make_range (std::string& s) { return boost::make_iterator_range( boost::make_filter_iterator<not_space>(s.begin(), s.end()), boost::make_filter_iterator<not_space>(s.end(), s.end()) ); } int main() { using namespace boost::xpressive; std::string s = "aa bb cc"; typedef basic_regex<Iter_Skip> regex_skip; regex_skip rx = as_xpr("aa") >> "bbcc"; match_results<Iter_Skip> what; if(!regex_match(make_range(s), what, rx)) std::cout << "not found\n"; else std::cout << "found\n"; return 0; }

Jorge Lodos Vigil

5:43 p.m.

Dave Jenkins wrote:

...

...
Hi We are using xpressive with a grammar to match certain patterns. In some cases, we have the need to ignore white spaces. Using dynamic regexes, this can be achieved with the ignore_white_space constant. Is there a way to ignore white spaces using a grammar in xpressive other than modifying the grammar itself?

How about using a filter_iterator to skip spaces. Something like the following program:

#include <iostream> #include <string> #include <boost/config.hpp> #include <boost/iterator/filter_iterator.hpp> #include <boost/range/iterator_range.hpp> #include <boost/xpressive/xpressive_static.hpp>

struct not_space { inline bool operator()(char ch) const { return ' ' != ch; } };

typedef boost::filter_iterator<not_space, std::string::const_iterator> Iter_Skip;

boost::iterator_range<Iter_Skip> make_range (std::string& s) { return boost::make_iterator_range( boost::make_filter_iterator<not_space>(s.begin(), s.end()), boost::make_filter_iterator<not_space>(s.end(), s.end()) ); }

int main() { using namespace boost::xpressive;

std::string s = "aa bb cc";

typedef basic_regex<Iter_Skip> regex_skip; regex_skip rx = as_xpr("aa") >> "bbcc"; match_results<Iter_Skip> what;

if(!regex_match(make_range(s), what, rx)) std::cout << "not found\n"; else std::cout << "found\n"; return 0; }

Thank you for your help. Unfortunately our grammar definition have spaces, we can not remove all of them. I'll think more about your idea of modifying the way to traverse the string, perhaps we may do some sort of preprocessing. Thanks again! Cheers Jorge

Dave Jenkins

8:25 p.m.

"Jorge Lodos Vigil" <lodos@segurmatica.cu> wrote in message news:ECBF993526E3BC47BD232BDC77D6933706A916F588@mercurio.segurmatica.cu...

...

Dave Jenkins wrote:

...
...
Hi We are using xpressive with a grammar to match certain patterns. In some cases, we have the need to ignore white spaces. Using dynamic regexes, this can be achieved with the ignore_white_space constant. Is there a way to ignore white spaces using a grammar in xpressive other than modifying the grammar itself?

How about using a filter_iterator to skip spaces. Something like the following program:

Thank you for your help. Unfortunately our grammar definition have spaces, we can not remove all of them. I'll think more about your idea of modifying the way to traverse the string, perhaps we may do some sort of preprocessing.

Oh, I see. How about something like this? It allows you to selectively skip spaces for parts of your regex. #include <iostream> #include <string> #include <boost/iterator/filter_iterator.hpp> #include <boost/range/iterator_range.hpp> #include <boost/xpressive/xpressive_static.hpp> #include <boost/xpressive/regex_actions.hpp> #include <boost/xpressive/proto/proto_typeof.hpp> static bool skip_spaces = false; struct not_space { inline bool operator()(char ch) const { return ' ' != ch || !skip_spaces; } static bool disable(std::string const & s) { skip_spaces = false; return true; } static bool enable(std::string const & s) { skip_spaces = true; return true; } }; typedef boost::filter_iterator<not_space, std::string::const_iterator> Iter_Skip; boost::iterator_range<Iter_Skip> make_range (std::string const& s) { return boost::make_iterator_range( boost::make_filter_iterator<not_space>(s.begin(), s.end()), boost::make_filter_iterator<not_space>(s.end(), s.end()) ); } int main() { using namespace boost::xpressive; std::string s = "a ab bc c"; BOOST_PROTO_AUTO( skip_spaces, nil[check(¬_space::enable)] ); BOOST_PROTO_AUTO( normal, nil[check(¬_space::disable)] ); typedef basic_regex<Iter_Skip> regex_skip; regex_skip rx = normal >> as_xpr("a a") >> skip_spaces >> "bb" >> normal >> "c c"; match_results<Iter_Skip> what; if(!regex_match(make_range(s), what, rx)) std::cout << "not found\n"; else std::cout << "found\n"; return 0; }

Jorge Lodos Vigil

8:51 p.m.

Dave Jenkins wrote:

...

...
...
...
Hi We are using xpressive with a grammar to match certain patterns. In some cases, we have the need to ignore white spaces. Using dynamic regexes, this can be achieved with the ignore_white_space constant. Is there a way to ignore white spaces using a grammar in xpressive other than modifying the grammar itself?

Oh, I see. How about something like this? It allows you to selectively skip spaces for parts of your regex.

#include <iostream> #include <string> #include <boost/iterator/filter_iterator.hpp> #include <boost/range/iterator_range.hpp> #include <boost/xpressive/xpressive_static.hpp> #include <boost/xpressive/regex_actions.hpp> #include <boost/xpressive/proto/proto_typeof.hpp>

static bool skip_spaces = false;

struct not_space { inline bool operator()(char ch) const { return ' ' != ch || !skip_spaces; } static bool disable(std::string const & s) { skip_spaces = false; return true; } static bool enable(std::string const & s) { skip_spaces = true; return true; } };

typedef boost::filter_iterator<not_space, std::string::const_iterator> Iter_Skip;

boost::iterator_range<Iter_Skip> make_range (std::string const& s) { return boost::make_iterator_range( boost::make_filter_iterator<not_space>(s.begin(), s.end()), boost::make_filter_iterator<not_space>(s.end(), s.end()) ); }

int main() { using namespace boost::xpressive;

std::string s = "a ab bc c";

BOOST_PROTO_AUTO( skip_spaces, nil[check(¬_space::enable)] ); BOOST_PROTO_AUTO( normal, nil[check(¬_space::disable)] );

typedef basic_regex<Iter_Skip> regex_skip; regex_skip rx = normal >> as_xpr("a a") >> skip_spaces >> "bb" >> normal >> "c c"; match_results<Iter_Skip> what;

if(!regex_match(make_range(s), what, rx)) std::cout << "not found\n"; else std::cout << "found\n"; return 0; }

I have to thank you again, this is a better solution than the prepocessing I was thinking about. This solves our problem. The only caveat is that proto is not yet a boost library, but I hope this will imply just a header change when definitely accepted and BOOST_PROTO_AUTO will not change. So far you came with 2 different ideas, the first one modifying the way we traverse the sequence, now an elegant way of modifying the grammar. There is still the alternative of modiying the algorithm to use an additional skip regex. I wonder, performance wise, what should be the best option for arbitrary texts? Do you (or someone else) could shed some light on this? Thanks once more! Cheers Jorge

Dave Jenkins

9:16 p.m.

"Jorge Lodos Vigil" <lodos@segurmatica.cu> wrote in message news:ECBF993526E3BC47BD232BDC77D6933706A916F5E6@mercurio.segurmatica.cu...

...

I have to thank you again, this is a better solution than the prepocessing I was thinking about. This solves our problem. The only caveat is that proto is not yet a boost library, but I hope this will imply just a header change when definitely accepted and BOOST_PROTO_AUTO will not change. So far you came with 2 different ideas, the first one modifying the way we traverse the sequence, now an elegant way of modifying the grammar. There is still the alternative of modiying the algorithm to use an additional skip regex. I wonder, performance wise, what should be the best option for arbitrary texts? Do you (or someone else) could shed some light on this?

If you're worried about using Proto, you could use: regex_skip skip_spaces = nil[check(¬_space::enable)]; regex_skip normal = nil[check(¬_space::disable)]; instead of: BOOST_PROTO_AUTO( skip_spaces, nil[check(¬_space::enable)] ); BOOST_PROTO_AUTO( normal, nil[check(¬_space::disable)] ); They do the same thing. I've just found BOOST_PROTO_AUTO is faster for small, nested regexes where you're not interested in the match info. As for performance, I don't know how a modified algorithm would compare with the modified iterator that I used. I presume you are thinking about something like icase (which ignores case), but that ignores spaces. Maybe Eric can answer that one.

Dave Jenkins

24 Nov 24 Nov

4:51 a.m.

"Dave Jenkins" <david@jenkins.net> wrote in message news:fi7d0e$60a$1@ger.gmane.org...

...

Oh, I see. How about something like this? It allows you to selectively skip spaces for parts of your regex.

I was thinking about the "selective filter iterator" example that I posted and realized that it won't handle xpressive backtracking properly. You can see it malfunction if you change the regex to: regex_skip rx = normal >> as_xpr("a a") >> skip_spaces >> as_xpr("bb") >> *as_xpr('c') >> // This causes the regex to fail when it should succeed normal >> "c c"; The "selective filter iterator" idea is right, it's just the iterator needs to remember when to skip spaces as it backtracks. I'll post another (hopefully correct) example when I've thought about it a bit.

Eric Niebler

7:21 p.m.

(Currently I'm on vacation. Sorry for the delay.) Jorge Lodos Vigil wrote:

...

Hi We are using xpressive with a grammar to match certain patterns. In some cases, we have the need to ignore white spaces. Using dynamic regexes, this can be achieved with the ignore_white_space constant. Is there a way to ignore white spaces using a grammar in xpressive other than modifying the grammar itself? We know spirit is an option, but we need to evaluate parsing speed with as many methods as possible. Thanks in advance.

I see that Dave has offered a couple of creative solutions, but that they interact badly with backtracking. What we really need is a whitespace skipping directive. Neither dynamic nor static xpressive has one. (The ignore_white_space option is like perl's /x option, and certainly doesn't do what you want.) The only thing I can think of off the top of my head is to do something like the following: BOOST_PROTO_AUTO( _ws, keep(*_w) ); ... and then use _ws in your grammar wherever you want to ignore whitespace, like: sregex rx = "some stuff" >> _ws >> "other stuff"; _ws will efficiently eat up the whitespace. Not ideal, but it works. If you don't want to use BOOST_PROTO_AUTO, the following is equivalent: boost::proto::unary_expr< boost::xpressive::detail::keeper_tag , boost::proto::dereference< boost::proto::terminal< boost::xpressive::detail::posix_charset_placeholder >::type >::type

...

::type const _ws = {{{{"w", false}}}};

-- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler

29 Nov 29 Nov

6:42 a.m.

Jorge Lodos Vigil wrote:

...

Hi We are using xpressive with a grammar to match certain patterns. In some cases, we have the need to ignore white spaces. Using dynamic regexes, this can be achieved with the ignore_white_space constant. Is there a way to ignore white spaces using a grammar in xpressive other than modifying the grammar itself? We know spirit is an option, but we need to evaluate parsing speed with as many methods as possible. Thanks in advance.

I have been experimenting with a skip() directive for xpressive that lets you skip whitespace in a pattern. It does require modifying the grammar, but only in one place: sregex rx = skip(_s)(alpha >> +_d); This is equivalent to: sregex rx = keep(*_s) >> alpha >> +(keep(*_s) >> _d) >> *_s; You can use any valid sub-expression as a skipper. Let me know if you find something like this useful. I'm attaching the code. It is for use with xpressive 2.0, which you can find in subversion or the file vault (http://tinyurl.com/8fean). -- Eric Niebler Boost Consulting www.boost-consulting.com /////////////////////////////////////////////////////////////////////////////// // main.hpp // // Copyright 2007 Eric Niebler. Distributed under the Boost // Software License, Version 1.0. (See accompanying file // LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) #include <iostream> #include <boost/xpressive/xpressive.hpp> namespace boost { namespace xpressive { namespace detail { using proto::_; // replace "Expr" with "keep(*State) >> Expr" template<typename Grammar> struct skip_primitives : Grammar { template<typename Expr, typename State, typename Visitor> struct apply : proto::shift_right< typename proto::unary_expr< keeper_tag , typename proto::dereference<State>::type >::type , Expr > {}; template<typename Expr, typename State, typename Visitor> static typename apply<Expr, State, Visitor>::type call(Expr const &expr, State const &state, Visitor &visitor) { typedef typename apply<Expr, State, Visitor>::type type; type that = {{{state}}, expr}; return that; } }; struct Primitives : proto::or_< proto::terminal<_> , proto::comma<_, _> , proto::subscript<proto::terminal<set_initializer>, _> , proto::assign<proto::terminal<set_initializer>, _> , proto::assign<proto::terminal<attribute_placeholder<_> >, _> , proto::complement<Primitives> > {}; struct SkipGrammar : proto::or_< skip_primitives<Primitives> , proto::assign<proto::terminal<mark_placeholder>, SkipGrammar> // don't "skip" mark tags , proto::subscript<SkipGrammar, _> // don't put skips in actions , proto::binary_expr<modifier_tag, _, SkipGrammar> // don't skip modifiers , proto::nary_expr<_, proto::vararg<SkipGrammar> > // everything else is fair game! > {}; template<typename Skip> struct skip_directive { typedef typename proto::result_of::as_expr<Skip>::type skip_type; skip_directive(Skip const &skip) : skip_(proto::as_expr(skip)) {} template<typename Sig> struct result; template<typename This, typename Expr> struct result<This(Expr)> : proto::shift_right< typename SkipGrammar::apply< typename proto::result_of::as_expr<Expr>::type , skip_type , mpl::void_ >::type , typename proto::dereference<skip_type>::type > {}; template<typename Expr> typename result<skip_directive(Expr)>::type operator ()(Expr const &expr) const { mpl::void_ ignore; typedef typename result<skip_directive(Expr)>::type result_type; result_type result = {SkipGrammar::call(proto::as_expr(expr), this->skip_, ignore), {skip_}}; return result; } private: skip_type skip_; }; } // skip template<typename Skip> detail::skip_directive<Skip> skip(Skip const &skip) { return detail::skip_directive<Skip>(skip); } }} using namespace boost::xpressive; int main() { std::string s = "a a b b c c"; sregex rx = "a a" >> skip(_s) ( (s1= as_xpr('b')) >> as_xpr('b') >> *as_xpr('c') // causes backtracking ) >> "c c"; smatch what; bool ok = regex_match(s, what, rx); std::cout << (ok ? "found" : "not found") << '\n'; s = "123,456,789"; sregex rx2 = skip(',')(+_d); ok = regex_match(s, what, rx2); std::cout << (ok ? "found" : "not found") << '\n'; return 0; } //#include <map> //#include <boost/xpressive/regex_actions.hpp> // //template<typename Expr> //void test_skip_aux(Expr const &expr) //{ // sregex rx = skip(_s)(expr); //} // //void test_skip() //{ // int i=0; // std::map<std::string, int> syms; // std::locale loc; // // test_skip_aux( 'a' ); // test_skip_aux( _ ); // test_skip_aux( +_ ); // test_skip_aux( -+_ ); // test_skip_aux( !_ ); // test_skip_aux( -!_ ); // test_skip_aux( repeat<0,42>(_) ); // test_skip_aux( -repeat<0,42>(_) ); // test_skip_aux( _ >> 'a' ); // test_skip_aux( _ >> 'a' | _ ); // test_skip_aux( _ >> 'a' | _ >> 'b' ); // test_skip_aux( s1= _ >> 'a' | _ >> 'b' ); // test_skip_aux( icase(_ >> 'a' | _ >> 'b') ); // test_skip_aux( imbue(loc)(_ >> 'a' | _ >> 'b') ); // test_skip_aux( (set='a') ); // test_skip_aux( (set='a','b') ); // test_skip_aux( ~(set='a') ); // test_skip_aux( ~(set='a','b') ); // test_skip_aux( range('a','b') ); // test_skip_aux( ~range('a','b') ); // test_skip_aux( set['a' | alpha] ); // test_skip_aux( ~set['a' | alpha] ); // test_skip_aux( before(_) ); // test_skip_aux( ~before(_) ); // test_skip_aux( after(_) ); // test_skip_aux( ~after(_) ); // test_skip_aux( keep(*_) ); // test_skip_aux( (*_)[ref(i) = as<int>(_) + 1] ); // test_skip_aux( (a1= syms)[ref(i) = a1 + 1] ); //}

6456

Age (days ago)

6462

Last active (days ago)

List overview

Download

8 comments

3 participants

participants (3)

Dave Jenkins
Eric Niebler
Jorge Lodos Vigil