Can I use Boost.Regex with multilne text to be recognized?
More accurately, my question should be: How hard (or easy) is it to deal with multiple lines of parsed text in Regex? Is there anything special to do? (such as defining an end of line character(s)) Perhaps the best advice is to stay away from Regex and use something more powerful (such as Xpressive)? TIA, -Ramon
On Mon, Sep 14, 2009 at 6:12 PM, Ramon F Herrera
More accurately, my question should be:
How hard (or easy) is it to deal with multiple lines of parsed text in Regex?
Is there anything special to do? (such as defining an end of line character(s))
Perhaps the best advice is to stay away from Regex and use something more powerful (such as Xpressive)?
Regex can handle multi-lines easy, it is all a text blob as far as it is concerned. Depending on what you want to use will depend on what you are doing, so what are you trying to parse out?
OvermindDL1 wrote:
On Mon, Sep 14, 2009 at 6:12 PM, Ramon F Herrera
wrote: More accurately, my question should be:
How hard (or easy) is it to deal with multiple lines of parsed text in Regex?
Is there anything special to do? (such as defining an end of line character(s))
Perhaps the best advice is to stay away from Regex and use something more powerful (such as Xpressive)?
Regex can handle multi-lines easy, it is all a text blob as far as it is concerned. Depending on what you want to use will depend on what you are doing, so what are you trying to parse out?
Hi Overmind, I am trying to parse multiple files with the structure indicated below. I sort of got started, but I hate it if I am going to hit a wall. I guess I could start by defining a line like this: string variable = "([A-Za-z0-9][\\w\\h\\(\\)\\-\\.,/&]*)"; char equal_sign = '='; string value = "(.+)"; string assignment = variable + equal_sign + value; string line = assignment + eol; Any tips and hints are most appreciated and welcome... -Ramon --------------------- [Unique ID 1] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value [Unique ID 2] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value [Unique ID 3] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value
On Mon, Sep 14, 2009 at 6:35 PM, Ramon F Herrera
I am trying to parse multiple files with the structure indicated below. I sort of got started, but I hate it if I am going to hit a wall.
I guess I could start by defining a line like this:
string variable = "([A-Za-z0-9][\\w\\h\\(\\)\\-\\.,/&]*)"; char equal_sign = '='; string value = "(.+)"; string assignment = variable + equal_sign + value;
string line = assignment + eol;
Any tips and hints are most appreciated and welcome...
-Ramon
---------------------
[Unique ID 1] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value
[Unique ID 2] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value
[Unique ID 3] Variable Name = Variable Value Variable Name = Variable Value Variable Name = Variable Value
You could do that using regex, but you will have to parse it into a structure yourself. You did not state, so I will just assume that the things in [] are section headings (ala an ini file) and that they are required first, and the Variable Name's can be duplicated (ala an ini file) and ordered. If so it might just be easier to use boost::spirit 2.1 as it can do the parsing and filling in your data structure all in one step, and it will run a great deal faster then regex. Something like this code would probably work: // Have not tested this code, writing it inside the email client itself... std::map< std::string, std::vector< std::pairstd::string,std::string
dataStuff;
using namespace boost::spirit; using namespace boost::spirit::qi; using namespace boost::spirit::standard; bool successful = parse(inputstream.begin(),inputstream.end(), *( '[' >> *(print-']') >> ']' >> eol >> ( +(print-(*space>>'=')) >> *space >> '=' >> *space >> +print >> eol ) ) ,dataStuff); As always, I make no guarantees of the quality of my above code when I am running on 6 hours past when I should be sleeping. You can also add a _pass semantic action to the first string match so you can absolutely ensure that each section ([]) name will be unique.
How hard (or easy) is it to deal with multiple lines of parsed text in Regex?
Multiline support is the default behaviour.
Is there anything special to do? (such as defining an end of line character(s))
No. HTH, John.
participants (3)
-
John Maddock
-
OvermindDL1
-
Ramon F Herrera