[xpressive] how to avoid running into stack overflow so easily

Hi Eric, (Apologies in advance for such a stupid question.) I'm building a fairly simple parser using xpressive. I need to parse text like this: node C91967 { label = "What brand of coffee do you use?"; position = (100 280); states = ("Columbian blend" "Columbian gold" "Columbian all star" "Columbian cinnamon"); HR_Inaccuracy = "0"; HR_Risk = "0"; HR_Time = "1"; HR_RefKey = ""; HR_Desc = "Question"; HR_ID = "91967"; HR_Def = "<div align=\"left\"> </div><br/>"; HR_Web_Page = ""; HR_Type2 = "3"; } potential (C91976 | CXXX) { data = ( 0 1 % Coffee type not to your taste 0 1 % Discount coffee 0.98 0.02 % Low quality coffee filter 0 1 % Low water quality 0 1 % Old coffee machine 0 1 % Used coffee filter 0 1 % Water coffee mixture incorrect 0 1 ); % Other Problem } There are multiple nodes or potential in the files, so my final regex looks something like netFile = modelConfiguration >> causeNode >> actions >> optional( questions ) >> causeProbabilities >> actionProbabilities; I then make one regex to parse a single potential, and wrap that regex in another so I can look for a list of those potentials: sregex row = *space >> _float[push_back(ref(actionProbs),as<Float>(_))] >> space >> -*_ >> _ln; actionPotential = -*_ >> "potential (C" >> -*_ >> "data = (" >> +row >> -*_ >> '}'; actionProbabilities = +( *space >> actionPotential ); Now my problem is that I get a stack overflow very easily if the 'row' regex is too long. In factm I have to remove the latter ">> space" from 'row' to not get a stack overflow. Am I using the library in a fundamentally wrong way? If so, how can I try to fix it? best regards -Thorsten

Thorsten Ottosen skrev:
Hi Eric,
(Apologies in advance for such a stupid question.)
I case it helps, then below is my complete function. -Thorsten ----------------------- NaiveModel NaiveModel::loadNetFile( const std::string& file ) { using namespace boost::xpressive; NaiveModel res( file ); std::string data = readFile( file ); std::vector<Float> costWeights; std::vector<Float> costConversion; std::vector<std::string> causeNames; std::vector<std::string> actionNames; std::map<std::string,std::string> actionNameToId; std::vector<std::string> questionNames; std::map<std::string,std::string> questionNameToId; std::vector<Float> causeProbs; std::vector<Float> actionProbs; std::vector<Float> questionProbs; bx::sregex space; bx::sregex nodeStart; bx::sregex nodeEnd; bx::sregex _float; bx::sregex name; bx::sregex modelConfiguration; bx::sregex causeNode; bx::sregex actionNode; bx::sregex actions; bx::sregex questionNode; bx::sregex questions; bx::sregex causeProbabilities; bx::sregex actionPotential; bx::sregex actionProbabilities; bx::sregex questionProbabilities; bx::sregex netFile; space = _s | _ln; nodeStart = +_d >> +space >> '{' >> +space; nodeEnd = as_xpr(';') >> +space >> '}'; _float = +_d | +_d >> (as_xpr('.') | ',') >> *_d; // @todo: consider -+~(set='"') name = *space >> '"' >> +(~set['"']) >> '"'; modelConfiguration = as_xpr("net") >> *space >> '{' >> -*_ >> '}'; causeNode = *_ >> "HR_Desc = \"All causes\";" >> *space >> "states = (" >> +(name[push_back(ref(causeNames),as<std::string>(_) )]) >> -*_ >> nodeEnd; actionNode = (as_xpr( "node C" ) >> nodeStart >> "label = " >> (s1=name) >> -*_ >> "HR_Desc = \"Action\"" >> -*_ >> "HR_ID = \"" >> (s2=+_d) >> -*_ >> nodeEnd)[ ref(actionNameToId)[s1] = as<std::string>(s2) ]; actions = +( *space >> actionNode ); questionNode = (as_xpr( "node C" ) >> nodeStart >> "label = " >> (s1=name) >> -*_ >> "HR_Desc = \"Question\"" >> -*_ >> "HR_ID = \"" >> (s2=+_d) >> -*_ >> nodeEnd)[ ref(questionNameToId)[s1] = as<std::string>(s2) ]; questions = +( *space >> questionNode ); causeProbabilities = -*_ >> "potential (CXXX" >> -*_ >> "data = (" >> +( *space >> _float[push_back(ref(causeProbs),as<Float>(_))] ) >> *space >> ")" >> nodeEnd; sregex row = *space >> _float[push_back(ref(actionProbs),as<Float>(_))] >> space >> -*_ >> _ln; actionPotential = -*_ >> "potential (C" >> -*_ >> "data = (" >> +row >> -*_ >> '}'; actionProbabilities = +( *space >> actionPotential ); netFile = *space >> modelConfiguration >> causeNode >> actions >> optional( questions ) >> causeProbabilities >> actionProbabilities >> //questionProbabilities >> *_; bx::smatch what; if( bx::regex_search( data, what, netFile ) ) { Engine::print( causeNames ); Engine::printMap( actionNameToId ); Engine::printMap( questionNameToId ); Engine::print( questionNames ); Engine::print( causeProbs ); Engine::print( actionProbs ); //std::cerr << "\n\n" << what[0] << "\n\n" << std::endl; } else std::cerr << "\n\ndid not match file" << file << "!\n\n" << std::endl; //DEZIDE_ENFORCE_MSG( causeProbs.size() == causeNames().size(), // "Invalid parsing of causes!" ); //std::cerr << data; return res; }

Thorsten Ottosen wrote:
Thorsten Ottosen skrev:
Hi Eric,
(Apologies in advance for such a stupid question.)
I case it helps, then below is my complete function.
The single most important thing you can do, both to reduce stack space and improve parse times, is to eliminate backtracking where you can with the keep() directive. For instance, in a lot of places you have *space
X, where X cannot match a whitespace character and space is an sregex defined as _s | _ln. Replace it with keep(*_s) or keep(+_s). It'll mean the same thing, use up vastly less stack and be *much* more efficient.
And a tip: when you quantify something simple (e.g., a single char matcher) like +_s it is implemented iteratively. When you quantify something that can match a variable number of characters like +space or +(_s | _ln) it is implemented recursively. HTH, -- Eric Niebler Boost Consulting www.boost-consulting.com
participants (2)
-
Eric Niebler
-
Thorsten Ottosen