The Tokenizer library has a char_separator with the option to keep delimiters, drop delimiters, and keep or drop empty tokens. However, with escaped_list_separator, the only behavior is to keep empty tokens. While this is the obvious behavior for parsing csv and similar files, it would be nice to have the ability to also drop empty tokens when constructing an escaped_list_separator. I have a command line parser that either reads its arguments from the command line itself or a text file supplied on the command line. In the file I'm passing in formats for the Date Time library I/O routines, and the formats have spaces that I'm escaping so the format will be a single token, which Tokenizer does find. But I sometimes use multiple tabs to separate my fields so it will look pretty in a text editor, and escaped_list_separator is keeping these. The solution for now is to have a switch in my command line parser for which separator I want to use. thanks, matthew
On Tue, Jul 14, 2009 at 11:48 AM, Polder, Matthew
J
The Tokenizer library has a char_separator with the option to keep delimiters, drop delimiters, and keep or drop empty tokens. However, with escaped_list_separator, the only behavior is to keep empty tokens. While this is the obvious behavior for parsing csv and similar files, it would be nice to have the ability to also drop empty tokens when constructing an escaped_list_separator.
I have a command line parser that either reads its arguments from the command line itself or a text file supplied on the command line. In the file I’m passing in formats for the Date Time library I/O routines, and the formats have spaces that I’m escaping so the format will be a single token, which Tokenizer does find. But I sometimes use multiple tabs to separate my fields so it will look pretty in a text editor, and escaped_list_separator is keeping these. The solution for now is to have a switch in my command line parser for which separator I want to use.
In the loop that you process tokens, you should be able to deal with this by simply doing: mytok::iterator begin = toker.begin(); mytok::iterator end = toker.end(); while (begin != end) { if (begin->empty()) continue; //do normal token processing ++begin; } I guess it's doing this because it was originally designed to support CSV files, which can contain empty fields. so ,, in a CSV represents an empty field, so in your case <space><space> would represent an empty field too. But since it's an empty field, the value of *iter is the empty string, and there should be no other time where it will ever evaluate to an empty string. If nothing else it's a not-too-hackish workaround, but maybe a constructor argument bool ignore_empty_fields with default value of false would be niec too.
participants (2)
-
Polder, Matthew J
-
Zachary Turner