data:image/s3,"s3://crabby-images/94e95/94e95f53c21f481e6146026bce52f74756359695" alt=""
Hello all, I have a follow table: OPERATING REVENUES: publishing $ 42,419 $ 44,754 $ 46,203 collegiate marketing and production services 97 ASSOCIATION MANAGEMENT SERVICES 16 wireless 8,883 8,129 7,507 51,302 52,883 53,823 All I know is that OPERATING REVENUES: will be always there, question is how to write a regular expression to capture the total (which is 51,302 here) There might be more/less than four rows in the table. Would really appreciated if anyone has good suggestion on this. /Winson
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
Winson Yung wrote:
Hello all, I have a follow table:
OPERATING REVENUES: publishing $ 42,419 $ 44,754 $ 46,203 collegiate marketing and production services 97 ASSOCIATION MANAGEMENT SERVICES
16 wireless 8,883 8,129 7,507
51,302 52,883 53,823
All I know is that OPERATING REVENUES: will be always there, question is how to write a regular expression to capture the total (which is 51,302 here) There might be more/less than four rows in the table. Would really appreciated if anyone has good suggestion on this.
I'm assuming that the difference between the sub-totals and the totals is that the sub-totals always have a header? If so then off the top of my head (caution untried!) something like: "OPERATING\\s+REVENUES:[[:blank:]]*[\r\n]+" // tag line "(?:" // group sub-totals "\\s*[^\\d$][^\r\n]*[\r\n]+[^\r\n]+[\r\n]+" // sub-total=two lines ")*" // close group and repeat "\\s+\\$?([\\d,.)+" // capture total HTH, John.
data:image/s3,"s3://crabby-images/94e95/94e95f53c21f481e6146026bce52f74756359695" alt=""
Thank you John, right the difference is in the header. Your regular
expression however doesn't make sense to me though, where is the part that
matchs the sub-total header?
On 8/2/06, John Maddock
Winson Yung wrote:
Hello all, I have a follow table:
OPERATING REVENUES: publishing $ 42,419 $ 44,754 $ 46,203 collegiate marketing and production services 97 ASSOCIATION MANAGEMENT SERVICES
16 wireless 8,883 8,129 7,507
51,302 52,883 53,823
All I know is that OPERATING REVENUES: will be always there, question is how to write a regular expression to capture the total (which is 51,302 here) There might be more/less than four rows in the table. Would really appreciated if anyone has good suggestion on this.
I'm assuming that the difference between the sub-totals and the totals is that the sub-totals always have a header? If so then off the top of my head (caution untried!) something like:
"OPERATING\\s+REVENUES:[[:blank:]]*[\r\n]+" // tag line "(?:" // group sub-totals "\\s*[^\\d$][^\r\n]*[\r\n]+[^\r\n]+[\r\n]+" // sub-total=two lines ")*" // close group and repeat "\\s+\\$?([\\d,.)+" // capture total
HTH, John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
Winson Yung wrote:
Thank you John, right the difference is in the header. Your regular expression however doesn't make sense to me though, where is the part that matchs the sub-total header?
consume whitespace: "\\s+" match something that is not a number: "[\\d$]" consume the rest of that line: "[^\r\n]*[\r\n]+" consume the next line as well: "[^\r\n]+[\r\n]+" All of that was in a (?: ... )+ block since it can be repeated several times before we get to a line with no header. Clear now? John.
data:image/s3,"s3://crabby-images/94e95/94e95f53c21f481e6146026bce52f74756359695" alt=""
sorry about the original text formating, it totally screwed up the way it
looked like.
But yeah, thanks for the clarification, I think at least this:
match something that is not a number: "[\\d$]"
should write as:
match something that is not a number: "[^\\d$]"
OPERATING REVENUES:
publishing $ 42,419 $ 44,754 $ 46,203 collegiate
marketing and production services 97
ASSOCIATION MANAGEMENT SERVICES 16
wireless 8,883 8,129 7,507
51,302 52,883 53,823
On 8/2/06, John Maddock
Winson Yung wrote:
Thank you John, right the difference is in the header. Your regular expression however doesn't make sense to me though, where is the part that matchs the sub-total header?
consume whitespace: "\\s+" match something that is not a number: "[\\d$]" consume the rest of that line: "[^\r\n]*[\r\n]+" consume the next line as well: "[^\r\n]+[\r\n]+"
All of that was in a (?: ... )+ block since it can be repeated several times before we get to a line with no header.
Clear now?
John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
Winson Yung wrote:
sorry about the original text formating, it totally screwed up the way it looked like. But yeah, thanks for the clarification, I think at least this:
match something that is not a number: "[\\d$]"
should write as:
match something that is not a number: "[^\\d$]"
Yes quite right, sorry! John.
participants (2)
-
John Maddock
-
Winson Yung