boost::xpressive - Regex stack space exhausted
Hi, I need to match a hex-written byte array, optionally separated with spaces. So i tried: boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit)); smatch match; regex_match (input, match, r); when i use input of approx. 150 hex pairs, i get an exception "Regex stack space exhausted" (i use default stack size by Visual studio 2008) This pattern looks quite simple, so I'd like to know, if there is some fundamental problem with this expressions. I know i can enlarge the stack manually, but i expect the input to be cca 10kB long, depending on user input, so i consider it not being a good solution. Pavol
On 7/21/2010 2:39 PM, Pavol Supa wrote:
Hi,
I need to match a hex-written byte array, optionally separated with spaces. So i tried:
boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit)); smatch match; regex_match (input, match, r);
when i use input of approx. 150 hex pairs, i get an exception "Regex stack space exhausted" (i use default stack size by Visual studio 2008)
This pattern looks quite simple, so I'd like to know, if there is some fundamental problem with this expressions.
Yes. See http://www.boost.org/doc/libs/1_43_0/doc/html/xpressive/user_s_guide.html#bo... Not only will this pattern tear through stack, it'll run very slowly. Try this instead: * ( keep(*blank) >> repeat<2,2> (xdigit))
I know i can enlarge the stack manually, but i expect the input to be cca 10kB long, depending on user input, so i consider it not being a good solution.
-- Eric Niebler BoostPro Computing http://www.boostpro.com
Thanks Eric for the info. I already saw the "Beware Nested
Quantifiers" chapter, but in fact, even if i do not use quantifier
inside the parentheses, it overflows the stack. Regarding the "keep",
i tried many combinations of putting the subexpressions into keep(), i
will try this one if it helps
On Wed, Jul 21, 2010 at 10:52 PM, Eric Niebler
On 7/21/2010 2:39 PM, Pavol Supa wrote:
Hi,
I need to match a hex-written byte array, optionally separated with spaces. So i tried:
boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit)); smatch match; regex_match (input, match, r);
when i use input of approx. 150 hex pairs, i get an exception "Regex stack space exhausted" (i use default stack size by Visual studio 2008)
This pattern looks quite simple, so I'd like to know, if there is some fundamental problem with this expressions.
Yes. See http://www.boost.org/doc/libs/1_43_0/doc/html/xpressive/user_s_guide.html#bo...
Not only will this pattern tear through stack, it'll run very slowly. Try this instead:
* ( keep(*blank) >> repeat<2,2> (xdigit))
I know i can enlarge the stack manually, but i expect the input to be cca 10kB long, depending on user input, so i consider it not being a good solution.
-- Eric Niebler BoostPro Computing http://www.boostpro.com _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Wed, Jul 21, 2010 at 5:25 PM, Pavol Supa
Thanks Eric for the info. I already saw the "Beware Nested Quantifiers" chapter, but in fact, even if i do not use quantifier inside the parentheses, it overflows the stack. Regarding the "keep", i tried many combinations of putting the subexpressions into keep(), i will try this one if it helps
Please do not top post...
You might think of trying Boost.Spirit.Qi, it is a PEG parser (where
xpressive is a Regex parser). PEG does not explode like regex does
since it is purely greedy and it runs a great deal faster too.
On Wed, Jul 21, 2010 at 5:25 PM, Pavol Supa
On Wed, Jul 21, 2010 at 10:52 PM, Eric Niebler
wrote: On 7/21/2010 2:39 PM, Pavol Supa wrote:
Hi,
I need to match a hex-written byte array, optionally separated with spaces. So i tried:
boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit)); smatch match; regex_match (input, match, r);
when i use input of approx. 150 hex pairs, i get an exception "Regex stack space exhausted" (i use default stack size by Visual studio 2008)
This pattern looks quite simple, so I'd like to know, if there is some fundamental problem with this expressions.
Yes. See http://www.boost.org/doc/libs/1_43_0/doc/html/xpressive/user_s_guide.html#bo...
Not only will this pattern tear through stack, it'll run very slowly. Try this instead:
* ( keep(*blank) >> repeat<2,2> (xdigit))
I know i can enlarge the stack manually, but i expect the input to be cca 10kB long, depending on user input, so i consider it not being a good solution.
-- Eric Niebler BoostPro Computing http://www.boostpro.com _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Wed, Jul 21, 2010 at 10:52 PM, Eric Niebler
On 7/21/2010 2:39 PM, Pavol Supa wrote:
Hi,
I need to match a hex-written byte array, optionally separated with spaces. So i tried:
boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit)); smatch match; regex_match (input, match, r);
when i use input of approx. 150 hex pairs, i get an exception "Regex stack space exhausted" (i use default stack size by Visual studio 2008)
This pattern looks quite simple, so I'd like to know, if there is some fundamental problem with this expressions.
Yes. See http://www.boost.org/doc/libs/1_43_0/doc/html/xpressive/user_s_guide.html#bo...
Not only will this pattern tear through stack, it'll run very slowly. Try this instead:
* ( keep(*blank) >> repeat<2,2> (xdigit))
So, i tried it. It throws exceptions at ~230 hexdigit pairs. I played with "keep"s, the only 'better' combination is * keep ( (*blank) >> repeat<2,2> (xdigit)) which throws when input has 300 pairs
I know i can enlarge the stack manually, but i expect the input to be cca 10kB long, depending on user input, so i consider it not being a good solution.
-- Eric Niebler BoostPro Computing http://www.boostpro.com _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Thu, Jul 22, 2010 at 3:29 AM, Pavol Supa
On Wed, Jul 21, 2010 at 10:52 PM, Eric Niebler
wrote: On 7/21/2010 2:39 PM, Pavol Supa wrote:
Hi,
I need to match a hex-written byte array, optionally separated with spaces. So i tried:
boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit)); smatch match; regex_match (input, match, r);
when i use input of approx. 150 hex pairs, i get an exception "Regex stack space exhausted" (i use default stack size by Visual studio 2008)
This pattern looks quite simple, so I'd like to know, if there is some fundamental problem with this expressions.
Yes. See http://www.boost.org/doc/libs/1_43_0/doc/html/xpressive/user_s_guide.html#bo...
Not only will this pattern tear through stack, it'll run very slowly. Try this instead:
* ( keep(*blank) >> repeat<2,2> (xdigit))
So, i tried it. It throws exceptions at ~230 hexdigit pairs. I played with "keep"s, the only 'better' combination is
* keep ( (*blank) >> repeat<2,2> (xdigit))
which throws when input has 300 pairs
If you were using Boost.Spirit.Qi, then it should 'just work', that
rule in Boost.Spirit.Qi for a simple match like the above Regex
version is would be:
boost::spirit::qi::rule
On Thu, Jul 22, 2010 at 11:40 AM, OvermindDL1
On Thu, Jul 22, 2010 at 3:29 AM, Pavol Supa
wrote: On Wed, Jul 21, 2010 at 10:52 PM, Eric Niebler
wrote: On 7/21/2010 2:39 PM, Pavol Supa wrote:
Hi,
I need to match a hex-written byte array, optionally separated with spaces. So i tried:
boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit)); smatch match; regex_match (input, match, r);
when i use input of approx. 150 hex pairs, i get an exception "Regex stack space exhausted" (i use default stack size by Visual studio 2008)
This pattern looks quite simple, so I'd like to know, if there is some fundamental problem with this expressions.
Yes. See http://www.boost.org/doc/libs/1_43_0/doc/html/xpressive/user_s_guide.html#bo...
Not only will this pattern tear through stack, it'll run very slowly. Try this instead:
* ( keep(*blank) >> repeat<2,2> (xdigit))
So, i tried it. It throws exceptions at ~230 hexdigit pairs. I played with "keep"s, the only 'better' combination is
* keep ( (*blank) >> repeat<2,2> (xdigit))
which throws when input has 300 pairs
If you were using Boost.Spirit.Qi, then it should 'just work', that rule in Boost.Spirit.Qi for a simple match like the above Regex version is would be: boost::spirit::qi::rule
r = *lexeme[xdigit >> xdigit]; It can also parse and stuff it all into a string, a vector, or whatever, as characters or parse it into integers/shorts/whatever, basically any parsing need can be easily fulfilled, you really should try Boost.Spirit.Qi.
Ok, thanks for advise, i will suggest using Boost.Spirit.Qi to my colleagues, but it is not my decision.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On 7/22/2010 5:29 AM, Pavol Supa wrote:
On Wed, Jul 21, 2010 at 10:52 PM, Eric Niebler
wrote: On 7/21/2010 2:39 PM, Pavol Supa wrote:
Hi,
I need to match a hex-written byte array, optionally separated with spaces. So i tried:
boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit));
Wait a minute. What does the data you're trying to match look like? Do you know that xdigit only matches a single hex character? Your data should look something like: 00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff 00 11 .... Is that right?
smatch match; regex_match (input, match, r);
when i use input of approx. 150 hex pairs, i get an exception "Regex stack space exhausted" (i use default stack size by Visual studio 2008)
This pattern looks quite simple, so I'd like to know, if there is some fundamental problem with this expressions.
Yes. See http://www.boost.org/doc/libs/1_43_0/doc/html/xpressive/user_s_guide.html#bo...
Not only will this pattern tear through stack, it'll run very slowly. Try this instead:
* ( keep(*blank) >> repeat<2,2> (xdigit))
So, i tried it. It throws exceptions at ~230 hexdigit pairs.
That doesn't sound right. This could be a bug. I'll look at this today. -- Eric Niebler BoostPro Computing http://www.boostpro.com
On Thu, Jul 22, 2010 at 2:33 PM, Eric Niebler
On 7/22/2010 5:29 AM, Pavol Supa wrote:
On Wed, Jul 21, 2010 at 10:52 PM, Eric Niebler
wrote: On 7/21/2010 2:39 PM, Pavol Supa wrote:
Hi,
I need to match a hex-written byte array, optionally separated with spaces. So i tried:
boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit));
Wait a minute. What does the data you're trying to match look like? Do you know that xdigit only matches a single hex character? Your data should look something like:
00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff 00 11 ....
Is that right?
i used data just without spaces, 000000000000...00000 the data are most often without spaces, so i decided meanwhile to use the workaround (it doesn't throw): expr = * repeat<2,2> (xdigit) but if i move the inner expression to a function, it starts throwing (at similar length), i.e.: sregex byte() { return repeat<2,2> (xdigit) } sregex expr = * byte();
smatch match; regex_match (input, match, r);
when i use input of approx. 150 hex pairs, i get an exception "Regex stack space exhausted" (i use default stack size by Visual studio 2008)
This pattern looks quite simple, so I'd like to know, if there is some fundamental problem with this expressions.
Yes. See http://www.boost.org/doc/libs/1_43_0/doc/html/xpressive/user_s_guide.html#bo...
Not only will this pattern tear through stack, it'll run very slowly. Try this instead:
* ( keep(*blank) >> repeat<2,2> (xdigit))
So, i tried it. It throws exceptions at ~230 hexdigit pairs.
That doesn't sound right. This could be a bug. I'll look at this today.
i tried to debug the parsing, but it looked ok - when there is a non-trivial expression inside (as mentioned above), it goes into recursion, about 12 functions on callstack, and it eats whole (default) stack at those mentioned quantities
-- Eric Niebler BoostPro Computing http://www.boostpro.com _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On 7/22/2010 10:01 AM, Pavol Supa wrote:
On Thu, Jul 22, 2010 at 2:33 PM, Eric Niebler
wrote: On 7/22/2010 5:29 AM, Pavol Supa wrote:
On Wed, Jul 21, 2010 at 10:52 PM, Eric Niebler
wrote: On 7/21/2010 2:39 PM, Pavol Supa wrote:
Hi,
I need to match a hex-written byte array, optionally separated with spaces. So i tried:
boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit));
Wait a minute. What does the data you're trying to match look like? Do you know that xdigit only matches a single hex character? Your data should look something like:
00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff 00 11 ....
Is that right?
i used data just without spaces, 000000000000...00000
the data are most often without spaces, so i decided meanwhile to use the workaround (it doesn't throw): expr = * repeat<2,2> (xdigit)
Then why not just "expr = *xdigit" in that case?
but if i move the inner expression to a function, it starts throwing (at similar length), i.e.: sregex byte() { return repeat<2,2> (xdigit) } sregex expr = * byte();
Yes, these are different beasts altogether. A nested regex match creates a nested match_results object, with nested submatches. By assigning to a regex object, you have erased the type information that xpressive uses to optimize matching. In this case, xpressive no longer knows that it's quantifying something of fixed-width so that backtracking is unnecessary.
smatch match; regex_match (input, match, r);
when i use input of approx. 150 hex pairs, i get an exception "Regex stack space exhausted" (i use default stack size by Visual studio 2008)
This pattern looks quite simple, so I'd like to know, if there is some fundamental problem with this expressions.
Yes. See http://www.boost.org/doc/libs/1_43_0/doc/html/xpressive/user_s_guide.html#bo...
Not only will this pattern tear through stack, it'll run very slowly. Try this instead:
* ( keep(*blank) >> repeat<2,2> (xdigit))
So, i tried it. It throws exceptions at ~230 hexdigit pairs.
That doesn't sound right. This could be a bug. I'll look at this today.
i tried to debug the parsing, but it looked ok - when there is a non-trivial expression inside (as mentioned above), it goes into recursion, about 12 functions on callstack, and it eats whole (default) stack at those mentioned quantities
See above. I hope it makes sense now. But still, *(keep(*blank)>>repeat<2,2>(xdigit)) should work just fine. I'll look. -- Eric Niebler BoostPro Computing http://www.boostpro.com
On 7/22/2010 10:10 AM, Eric Niebler wrote:
On 7/22/2010 10:01 AM, Pavol Supa wrote:
But still, *(keep(*blank)>>repeat<2,2>(xdigit)) should work just fine. I'll look.
Shoot, I didn't get to this today. It's on my list, but I'm super-busy right now. Sorry! :-( -- Eric Niebler BoostPro Computing http://www.boostpro.com
On Fri, Jul 23, 2010 at 5:09 AM, Eric Niebler
On 7/22/2010 10:10 AM, Eric Niebler wrote:
On 7/22/2010 10:01 AM, Pavol Supa wrote:
But still, *(keep(*blank)>>repeat<2,2>(xdigit)) should work just fine. I'll look.
Shoot, I didn't get to this today. It's on my list, but I'm super-busy right now. Sorry! :-(
No problem, i have a workaround, and soon i will try the Boost.Spirit.Qi, as OvermindDL1 advised. Just please let me know if you find out something.
-- Eric Niebler BoostPro Computing http://www.boostpro.com _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Fri, Jul 23, 2010 at 3:46 AM, Pavol Supa
On Fri, Jul 23, 2010 at 5:09 AM, Eric Niebler
wrote: On 7/22/2010 10:10 AM, Eric Niebler wrote:
On 7/22/2010 10:01 AM, Pavol Supa wrote:
But still, *(keep(*blank)>>repeat<2,2>(xdigit)) should work just fine. I'll look.
Shoot, I didn't get to this today. It's on my list, but I'm super-busy right now. Sorry! :-(
No problem, i have a workaround, and soon i will try the Boost.Spirit.Qi, as OvermindDL1 advised.
Just please let me know if you find out something.
Also note this regex option "BOOST_REGEX_NON_RECURSIVE" on page: http://www.boost.org/doc/libs/1_43_0/libs/regex/doc/html/boost_regex/configu...
Just please let me know if you find out something.
Also note this regex option "BOOST_REGEX_NON_RECURSIVE" on page: http://www.boost.org/doc/libs/1_43_0/libs/regex/doc/html/boost_regex/configu...
Which is the default for most compilers now. However, that doesn't solve the issue if the problem is that the regular expression is so inefficient that it just gobbles up space.... John.
On 7/21/2010 2:39 PM, Pavol Supa wrote:
Hi,
I need to match a hex-written byte array, optionally separated with spaces. So i tried:
boost::xpressive::sregex r = * ( * blank >> repeat<2,2> (xdigit)); smatch match; regex_match (input, match, r);
Oh also, beware of blank; only the <space> and <tab> characters are guaranteed to be in this set. -- Eric Niebler BoostPro Computing http://www.boostpro.com
participants (4)
-
Eric Niebler
-
John Maddock
-
OvermindDL1
-
Pavol Supa