How to handle blank space in parsing??

Hello all, Boost::regex has been a Godsent to me. Before it, I performed any parsing with Perl. Now I am becoming comfortable with the regex package. (a million thanks to the folks responsible for it, and for Boost in general, btw) My only problem is spaces. I design typical mini computer language syntaxes, for example: Table[1, 4] = Value AND Array[8, 4] My initial approach was to get rid of all spaces before parsing (I recall being taught ages ago that the FORTRAN compiler does just that). Such strategy works in some cases, but the above line would become: Table[1,4]=ValueANDArray[8,4] Which is obviously bad news. I solved this particular case with this syntax: Table[1, 4] = Value && Array[8, 4] My latest "language" looks like this and has the same problem described above: Table rows [1758, 1904, 2053, 2201, 2345, 2497] Table cols [372, 880, 1336, 1756, 2083, 2439] I guess I could insert a colon or something between the first two words, but I am sure there has to be a better way. TIA, -RFH

On Mar 7, 2011, at 8:52 AM, Ramon F Herrera wrote:
Hello all,
Boost::regex has been a Godsent to me. Before it, I performed any parsing with Perl. Now I am becoming comfortable with the regex package. (a million thanks to the folks responsible for it, and for Boost in general, btw)
My only problem is spaces. I design typical mini computer language syntaxes, for example:
Table[1, 4] = Value AND Array[8, 4]
My initial approach was to get rid of all spaces before parsing (I recall being taught ages ago that the FORTRAN compiler does just that). Such strategy works in some cases, but the above line would become:
Table[1,4]=ValueANDArray[8,4]
Which is obviously bad news. I solved this particular case with this syntax:
Table[1, 4] = Value && Array[8, 4]
My latest "language" looks like this and has the same problem described above:
Table rows [1758, 1904, 2053, 2201, 2345, 2497] Table cols [372, 880, 1336, 1756, 2083, 2439]
I guess I could insert a colon or something between the first two words, but I am sure there has to be a better way.
I would suggest that Boost::Regex is the wrong tool here. You should take a look at Boost::Spirit instead. -- Marshall Marshall Clow Idio Software mailto:mclow.lists@gmail.com A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On 03/07/2011 10:52 AM, Ramon F Herrera wrote:
Boost::regex [...] My only problem is spaces. I design typical mini computer language syntaxes, for example:
Table[1, 4] = Value AND Array[8, 4]
My initial approach was to get rid of all spaces before parsing (I recall being taught ages ago that the FORTRAN compiler does just that).
Most of the world seems to prefer languages with C-like syntax.
Such strategy works in some cases, but the above line would become:
Table[1,4]=ValueANDArray[8,4]
Which is obviously bad news. I solved this particular case with this syntax:
Table[1, 4] = Value && Array[8, 4]
My latest "language" looks like this and has the same problem described above:
Table rows [1758, 1904, 2053, 2201, 2345, 2497] Table cols [372, 880, 1336, 1756, 2083, 2439]
I guess I could insert a colon or something between the first two words, but I am sure there has to be a better way.
I usually just use \w+ in the pattern where there needs to be a space and \w* in the pattern where space is optional. E.g. "\\w*Table\\w+rows\\w*" If you still want to continue throwing away spaces you could match for the merged identifier possibilities: "Table(rows|cols)" or "Table([a-z]+)" - Marsh
participants (3)
-
Marsh Ray
-
Marshall Clow
-
Ramon F Herrera