
Hello all, I have a follow table: OPERATING REVENUES: publishing $ 42,419 $ 44,754 $ 46,203 collegiate marketing and production services 97 ASSOCIATION MANAGEMENT SERVICES 16 wireless 8,883 8,129 7,507 51,302 52,883 53,823 All I know is that OPERATING REVENUES: will be always there, question is how to write a regular expression to capture the total (which is 51,302 here) There might be more/less than four rows in the table. Would really appreciated if anyone has good suggestion on this. /Winson

Winson Yung wrote:
Hello all, I have a follow table:
OPERATING REVENUES: publishing $ 42,419 $ 44,754 $ 46,203 collegiate marketing and production services 97 ASSOCIATION MANAGEMENT SERVICES
16 wireless 8,883 8,129 7,507
51,302 52,883 53,823
All I know is that OPERATING REVENUES: will be always there, question is how to write a regular expression to capture the total (which is 51,302 here) There might be more/less than four rows in the table. Would really appreciated if anyone has good suggestion on this.
I'm assuming that the difference between the sub-totals and the totals is that the sub-totals always have a header? If so then off the top of my head (caution untried!) something like: "OPERATING\\s+REVENUES:[[:blank:]]*[\r\n]+" // tag line "(?:" // group sub-totals "\\s*[^\\d$][^\r\n]*[\r\n]+[^\r\n]+[\r\n]+" // sub-total=two lines ")*" // close group and repeat "\\s+\\$?([\\d,.)+" // capture total HTH, John.

Thank you John, right the difference is in the header. Your regular
expression however doesn't make sense to me though, where is the part that
matchs the sub-total header?
On 8/2/06, John Maddock
Winson Yung wrote:
Hello all, I have a follow table:
OPERATING REVENUES: publishing $ 42,419 $ 44,754 $ 46,203 collegiate marketing and production services 97 ASSOCIATION MANAGEMENT SERVICES
16 wireless 8,883 8,129 7,507
51,302 52,883 53,823
All I know is that OPERATING REVENUES: will be always there, question is how to write a regular expression to capture the total (which is 51,302 here) There might be more/less than four rows in the table. Would really appreciated if anyone has good suggestion on this.
I'm assuming that the difference between the sub-totals and the totals is that the sub-totals always have a header? If so then off the top of my head (caution untried!) something like:
"OPERATING\\s+REVENUES:[[:blank:]]*[\r\n]+" // tag line "(?:" // group sub-totals "\\s*[^\\d$][^\r\n]*[\r\n]+[^\r\n]+[\r\n]+" // sub-total=two lines ")*" // close group and repeat "\\s+\\$?([\\d,.)+" // capture total
HTH, John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Winson Yung wrote:
Thank you John, right the difference is in the header. Your regular expression however doesn't make sense to me though, where is the part that matchs the sub-total header?
consume whitespace: "\\s+" match something that is not a number: "[\\d$]" consume the rest of that line: "[^\r\n]*[\r\n]+" consume the next line as well: "[^\r\n]+[\r\n]+" All of that was in a (?: ... )+ block since it can be repeated several times before we get to a line with no header. Clear now? John.

sorry about the original text formating, it totally screwed up the way it
looked like.
But yeah, thanks for the clarification, I think at least this:
match something that is not a number: "[\\d$]"
should write as:
match something that is not a number: "[^\\d$]"
OPERATING REVENUES:
publishing $ 42,419 $ 44,754 $ 46,203 collegiate
marketing and production services 97
ASSOCIATION MANAGEMENT SERVICES 16
wireless 8,883 8,129 7,507
51,302 52,883 53,823
On 8/2/06, John Maddock
Winson Yung wrote:
Thank you John, right the difference is in the header. Your regular expression however doesn't make sense to me though, where is the part that matchs the sub-total header?
consume whitespace: "\\s+" match something that is not a number: "[\\d$]" consume the rest of that line: "[^\r\n]*[\r\n]+" consume the next line as well: "[^\r\n]+[\r\n]+"
All of that was in a (?: ... )+ block since it can be repeated several times before we get to a line with no header.
Clear now?
John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Winson Yung wrote:
sorry about the original text formating, it totally screwed up the way it looked like. But yeah, thanks for the clarification, I think at least this:
match something that is not a number: "[\\d$]"
should write as:
match something that is not a number: "[^\\d$]"
Yes quite right, sorry! John.
participants (2)
-
John Maddock
-
Winson Yung