Help needed in creating Regex...
Hi everyone, I need a help in creating a regular expression, I need to select only "Sentences" in String using Regex so I wrote the Code .. *boost::regex re( "(\\S.+?[.!?])(?=\\s+|$)" );* This code considers that sentences break at any of these three characters . ! ? but this code considers string "Mr. Obama" as 2 difference sentences (being a dot character in between), where as I want it to be only one sentence. I have a got a string vector of all the words which have to excluded. std::vectorstd:stringexceptionVector; exceptionVector.pushback("Mr.") exceptionVector.pushback("Dr.") exceptionVector.pushback("Mrs.") Now in my regex, I want that if any of the words occur from my vector, it should NOT be a seperate sentence. Any idea how can I create this Regular expression Thanks in Advance -Subhash
Subhash Nagre wrote:
Hi everyone, I need a help in creating a regular expression, I need to select only "Sentences" in String using Regex so I wrote the Code ..
*boost::regex re( "(\\S.+?[.!?])(?=\\s+|$)" );*
Wouldn't it be easier and far more efficient to use boost split and provide a stateful delimiter, that keeps track of the last 3 chars encountered, and does not return true upon encountering a "." if the last 3 or 2 chars match any of the following sequences: Dr,Mr,Mrs etc..
Subhash Nagre
Now in my regex, I want that if any of the words occur from my vector, it should NOT be a seperate sentence. Any idea how can I create this Regular expression
I'd suggest not trying to do *everything* with a single regex. Why not just have a regex that is "sequence of stuff up to and including a period", and then analyze the resulting matches to see what constitutes an actual sentence? Best Regards, Tony
participants (3)
-
Anthony Foiani
-
Arash Partow
-
Subhash Nagre