This was more of brute force approach that I did when I first started using
Boost a few years ago. There may be (probably) better and/or more efficient
ways to do it: It was sufficient for what I was doing.
//-----------------------------------------------------------------
// Using tokenizer
using namespace boost;
typedef escaped_list_separator<char> CharTokens;
typedef tokenizer<CharTokens> EscapedTokenizer;
typedef tokenizer<CharTokens>::iterator EscapedIterator;
CharTokens cs(",",",",boost::keep_empty_tokens);
std::string str; // This has CSV input line
EscapedIterator eti;
EscapedTokenizer et(str,cs);
for (eti = et.begin(); eti != et,end(); eti++) {
if (*eti == ",") { // See if this is a separator
field_number++;
} else {
// *eti points to a value which could be an empty field
// field_number is the field in the list
}
}
//-----------------------------------------------------------------
// Using Spirit
//
// Result is a vector of items much list split() - including empty strings
in the
// vector for empty fields
//
// Probably could be used with any<>
using namespace boost::spirit;
char *plist_csv = new char[4096];
rule<> list_csv, list_csv_item;
std::vectorstd::string vec_item, vec_list;
parse_info<> result;
list_csv_item =
confix_p('\"', *c_escape_cha_p,'\"')
| longest_d(real_p | int_p | *(alnum_p | ch_p('_')))
;
list_csv =
list_p(
(!list_csv_item)[append(vec_item)],
',') [append(vec_list)]
;
result = parse(plist_csv,list_csv);
if (result.hit) // Got at least part
if (result.full) {
// All present
}
}
----- Original Message -----
From: "Christian Henning"
Hi Larry, can you share the code which can handle empty fields?
Thanks, Christian
On Dec 15, 2007 1:32 PM, Larry
wrote: If your CSV has empty fields (e.g., data,data,,data.....) the only way I found to handle the empty field was to handle the separators yourself with the tokenizer otherwise the tokenizer would skip the field (a la strtok()).
For CSVs I tried Spirit and came up with a scheme (with lots of help I would add) that seemed to work. Not many lines of code. It takes more time than I was interested in spending to figure it out.
Larry
----- Original Message ----- From: "Edward Diener"
Newsgroups: gmane.comp.lib.boost.user To: Sent: Saturday, December 15, 2007 9:44 AM Subject: Re: [boost-users] tokenizer vs string algorithm split. Bill Buklis wrote:
This may not matter for the CSV file you're parsing, but at least for a more general solution for CSV processing, you'd also have to handle fields that are surrounded by quotes and may even contain embedded commas. I don't know if split or tokenizer can handle that.
Tokenizer's escaped_list_separator handles quotes and embedded commas properly.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users