
What is the text you are matching against? If you can give me a concrete example I can test it here, but "it hangs" isn't very useful I'm afraid ;-)
[ sorry if editing is awful, I kept simlifying and now believe I have a
tractable
test case- I didn't expect data dependence earlier. I now have a single
regex
that works of fails depending on a simple data change as shown below.
And, again it would run with 10's of longer data strings against 1000's of
SIMPLER
regex;s and match exactly with diffs against greta results ]
Thanks. If you have an easy way to test this, the scenario is as follows:
I have a file containing multiple strings ( gene sequences, FWIW) and a set
of
rules ( regex's ) that describe interesting features in the string. The
normal sequence
is to apply the entire rule set to each string and return a vector of hits
per-string.
I played around with one of these files to find the simplest thing I could
find to
cause the error.
In these traces, line 114 is the query and 115 is the sample ( no
whitespace/crlf etc).
Unlike my first example with the assertion support, this seems to make it
through all
the rules at least once
This hangs ( I thought it may be a repetition problem but reliably hits on
this data/regex combo first pass):
$ $progpath/rules_annotater -clean -boost -doall -fastas q_fasta -debug
-rules
$progpath/boost_edit_rulesx > asdf
myboost.cpp114 ATG(...)*?(TAG|TAA|TGA)
myboost.cpp115 in
GAGATATTCACCTCTCATTGCCTTTTCCAGAGGTTGTTGAACTTAGTGGCCTGAGCATTTTA
TCTGCAAAATGACTAGCAATTTTTTTTTAAGTTTCAGGCTTTTTTAATGCCCTAAATACAGTTGATCCATTACCGAGTGT
GTTACATGCATAGGAATTTACTGATCTTTTCTTTTCCCCCTAGCTAGTTTTAAAGTTACTGAGCATAACGAGCTTTAAAA
ATTCTTCAGAATACAAATAAATGAATAGATAAAAGACTACCTCCATTTGATAAATCATTCAAGAAAAAGAAAAAAAAACT
TGAGCAAGCTAAGAAAGTCATTAACAGCCATATTTCTGATGGAACTAATGTxGATACCTACTCAAGCTAxCACTxGAATC
TAATAATCTGTGAGAGAAGAAATGGGAAAAGGTATGAAAGC
myboost.cpp121 looking for subexpr 0
myboost.cpp139
This DOESNOT hang ( note that it depends on removing the "X"'s):
$ $progpath/rules_annotater -clean -boost -doall -fastas q_fasta -debug
-rules
$progpath/boost_edit_rulesx > asdf
myboost.cpp114 ATG(...)*?(TAG|TAA|TGA)
myboost.cpp115 in
GAGATATTCACCTCTCATTGCCTTTTCCAGAGGTTGTTGAACTTAGTGGCCTGAGCATTTTA
TCTGCAAAATGACTAGCAATTTTTTTTTAAGTTTCAGGCTTTTTTAATGCCCTAAATACAGTTGATCCATTACCGAGTGT
GTTACATGCATAGGAATTTACTGATCTTTTCTTTTCCCCCTAGCTAGTTTTAAAGTTACTGAGCATAACGAGCTTTAAAA
ATTCTTCAGAATACAAATAAATGAATAGATAAAAGACTACCTCCATTTGATAAATCATTCAAGAAAAAGAAAAAAAAACT
TGAGCAAGCTAAGAAAGTCATTAACAGCCATATTTCTGATGGAACTAATGTxGATACCTACTxCAAGCTAxCACTxGAAT
CTAATAATCTGTGAGAGAAGAAATGGGAAAAGGTATGAAAGC
myboost.cpp121 looking for subexpr 0
myboost.cpp139
Administrator@TESTBED01 /cygdrive/e/new/temp/canis/known/grade_R/misc_tgf
$
All of these are char*, not std::string FWIW.
boost::regex expression(query);
boost::match_results
From: "John Maddock"
Reply-To: boost-users@lists.boost.org To: Subject: Re: [Boost-users] follow up on regex questions Date: Thu, 4 Oct 2007 17:37:22 +0100 Mike Marchywka wrote:
Hi, Thanks for your help in the past. I would normally drop the issue at this point until I get my build environment cleaned up (" My build is messed up, I haven't read the documentation. What is wrong with YOUR library?" LOL). but I do have one more question which I believe is related to boost regex processing. If someone has a known good regex test program or can point to an obvious problem it may be helpful.
You mean libs/regex/test/regress/*.cpp ?
It would be a good idea to build and run this to verify the sanity of your setup at least: I still have a suspision that the binaries you are using are not compatible with your build options or regex headers, but I can't be sure.
Again, this code seems to work with Microsoft's greta and boost gives identical results on a longer list of SIMPLER regexes so I reasonably believe that the problem is due to handling of more complicated expression ( One caveat, to be complete, is that greta did seem to return some spurious results but they are easily filter programmatically, things like negative location, but the plausible ones that I have checked manually are right). However, on this sequence of regexes (regexi?) I get either an abort OR the program hangs later on non-sensical execution (I know, "Gee, you have a build problem and the stack is messed up?").
myboost.cpp114 (GU.*?TACTAAC.{20,40}AG|^)(.*?)(GU.*?TACTAAC.{20,40}AG) myboost.cpp114 ATG(...)*?(TAG|TAA|TGA) myboost.cpp114 TATAA.*?ATAAA myboost.cpp114 (GU.*?TACTAAC.{20,40}AG|^)(.*?)(GU.*?TACTAAC.{20,40}AG) myboost.cpp114 ATG(...)*?(TAG|TAA|TGA)
( progam hangs in my code or had been core dumping in boost::regex )
What is the text you are matching against? If you can give me a concrete example I can test it here, but "it hangs" isn't very useful I'm afraid ;-)
John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_________________________________________________________________ Peek-a-boo FREE Tricks & Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us