Re: [boost] [Tokenizer]Usage and documentation

9 Feb 2011

      On Tue, Feb 8, 2011 at 3:13 PM, Max <more4less@sina.com> wrote:
...
I'm using boost::tokenizer to do some simple parsing of data file in a
format specified by the following rules:
-          One record of several fields in a single line
-          Adjacent data fields in a record separated by space char's(space
or tab), with or without ","
-          String without space(s), with or without quotation marks
-          String with space(s), with quotation marks
One example of a 4-field-per-record file is like:
"string  2"   3  4        5  4.3
"String",     2,  3.04    4  3
AnyOtherText, 2,  3.04    4  3
I normally use boost.regex's regex_token_iterator for this sort of task.
Try the following regex:

"([^"]*)"|(?:^|[[:space:],])+([^[:space:],]+)(?:$|[[:space:],])+

and tell regex_token_iterator to extract matches 1 and 2.

The above regex has a couple of quirks: "a""b" will be taken as two
fields, "a" and "b". a,,b will be taken as two fields, not three.

To read the file line by line, simply use std::getline.

Yechezkel Mett