Thanks Larry.
On Dec 15, 2007 5:23 PM, Larry
This was more of brute force approach that I did when I first started using Boost a few years ago. There may be (probably) better and/or more efficient ways to do it: It was sufficient for what I was doing.
//----------------------------------------------------------------- // Using tokenizer
using namespace boost;
typedef escaped_list_separator<char> CharTokens; typedef tokenizer<CharTokens> EscapedTokenizer; typedef tokenizer<CharTokens>::iterator EscapedIterator;
CharTokens cs(",",",",boost::keep_empty_tokens); std::string str; // This has CSV input line EscapedIterator eti;
EscapedTokenizer et(str,cs);
for (eti = et.begin(); eti != et,end(); eti++) { if (*eti == ",") { // See if this is a separator field_number++; } else { // *eti points to a value which could be an empty field // field_number is the field in the list } }
//----------------------------------------------------------------- // Using Spirit // // Result is a vector of items much list split() - including empty strings in the // vector for empty fields // // Probably could be used with any<>
using namespace boost::spirit;
char *plist_csv = new char[4096];
rule<> list_csv, list_csv_item; std::vectorstd::string vec_item, vec_list; parse_info<> result;
list_csv_item = confix_p('\"', *c_escape_cha_p,'\"') | longest_d(real_p | int_p | *(alnum_p | ch_p('_'))) ;
list_csv = list_p( (!list_csv_item)[append(vec_item)], ',') [append(vec_list)] ;
result = parse(plist_csv,list_csv);
if (result.hit) // Got at least part if (result.full) { // All present } }
----- Original Message ----- From: "Christian Henning"
Newsgroups: gmane.comp.lib.boost.user To: Sent: Saturday, December 15, 2007 1:38 PM Subject: Re: [boost-users] tokenizer vs string algorithm split.
Hi Larry, can you share the code which can handle empty fields?
Thanks, Christian
On Dec 15, 2007 1:32 PM, Larry
wrote: If your CSV has empty fields (e.g., data,data,,data.....) the only way I found to handle the empty field was to handle the separators yourself with the tokenizer otherwise the tokenizer would skip the field (a la strtok()).
For CSVs I tried Spirit and came up with a scheme (with lots of help I would add) that seemed to work. Not many lines of code. It takes more time than I was interested in spending to figure it out.
Larry
----- Original Message ----- From: "Edward Diener"
Newsgroups: gmane.comp.lib.boost.user To: Sent: Saturday, December 15, 2007 9:44 AM Subject: Re: [boost-users] tokenizer vs string algorithm split. Bill Buklis wrote:
This may not matter for the CSV file you're parsing, but at least for a more general solution for CSV processing, you'd also have to handle fields that are surrounded by quotes and may even contain embedded commas. I don't know if split or tokenizer can handle that.
Tokenizer's escaped_list_separator handles quotes and embedded commas properly.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users