Re: [Boost-users] [boost-users] tokenizer vs string algorithm split.

15 Dec 2007


      This was more of brute force approach that I did when I first started using 
Boost a few years ago. There may be (probably) better and/or more efficient 
ways to do it: It was sufficient for what I was doing.

//-----------------------------------------------------------------
//  Using tokenizer

using namespace boost;


typedef escaped_list_separator<char> CharTokens;
typedef tokenizer<CharTokens> EscapedTokenizer;
typedef tokenizer<CharTokens>::iterator EscapedIterator;

CharTokens cs(",",",",boost::keep_empty_tokens);
std::string str;        // This has CSV input line
EscapedIterator eti;

EscapedTokenizer et(str,cs);

for (eti = et.begin(); eti != et,end(); eti++) {
    if (*eti == ",")  {   // See if this is a separator
        field_number++;
    } else {
            // *eti points to a value which could be an empty field
            // field_number is the field in the list
    }
}


//-----------------------------------------------------------------
// Using Spirit
//
// Result is a vector of items much list split() - including empty strings 
in the
// vector for empty fields
//
// Probably could be used with any<>

using namespace boost::spirit;

char *plist_csv = new char[4096];

rule<> list_csv, list_csv_item;
std::vector<std::string> vec_item, vec_list;
parse_info<> result;

list_csv_item =
       confix_p('\"', *c_escape_cha_p,'\"')
       | longest_d(real_p | int_p | *(alnum_p | ch_p('_')))
    ;

list_csv =
        list_p(
            (!list_csv_item)[append(vec_item)],
            ',') [append(vec_list)]
    ;

result = parse(plist_csv,list_csv);

if (result.hit)   // Got at least part
    if (result.full) {
        // All present
    }
}


----- Original Message ----- 
From: "Christian Henning" <chhenning@gmail.com>
Newsgroups: gmane.comp.lib.boost.user
To: <boost-users@lists.boost.org>
Sent: Saturday, December 15, 2007 1:38 PM
Subject: Re: [boost-users] tokenizer vs string algorithm split.
...
Hi Larry, can you share the code which can handle empty fields?
Thanks,
Christian
On Dec 15, 2007 1:32 PM, Larry <lknain@nc.rr.com> wrote:
...
If your CSV has empty fields (e.g., data,data,,data.....) the only way I
found to handle the empty field was to handle the separators yourself 
with
the tokenizer otherwise the tokenizer would skip the field (a la 
strtok()).
For CSVs I tried Spirit and came up with a scheme (with lots of help I 
would
add) that seemed to work. Not many lines of code. It takes more time than 
I
was interested in spending to figure it out.
Larry
----- Original Message -----
From: "Edward Diener" <eldiener@tropicsoft.com>
Newsgroups: gmane.comp.lib.boost.user
To: <boost-users@lists.boost.org>
Sent: Saturday, December 15, 2007 9:44 AM
Subject: Re: [boost-users] tokenizer vs string algorithm split.
Bill Buklis wrote:
...
This may not matter for the CSV file you're parsing, but at least for a
more general solution for CSV processing, you'd also have to handle
fields that are surrounded by quotes and may even contain embedded
commas. I don't know if split or tokenizer can handle that.
Tokenizer's escaped_list_separator handles quotes and embedded commas
properly.
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users