
Hi, I like to break a file into tokens for processing. The file contains comments which are introduced by "//", "#" and ";". Can I setup the tokenizer directly such that the comments are skipped? If no, what would you suggest to erase the comments from my string before processing? Here is what I do right now: // CODE ifstream is( "file.txt" ); string file, line; file.reserve( 2 * 1024 * 1024 ); while ( getline( is, line ) ) { TrimHead( line ); if ( line[0] != '/' && line[1] != '/' ) file.append( line + "\n" ); // Need to append "\n" again to get the right tokens - not very nice } typedef tokenizer<char_separator<char> > Tokenizer; char_separator<char> sep(" \t\n"); Tokenizer tokens( file, sep ); // END CODE Another idea was to the following: // CODE ifstream is( "file.txt" ); string line( ( istreambuf_iterator<char>( is ) ), istreambuf_iterator<char>() ); EraseComments( line ); // END CODE Any help is appreciated. -Dirk

Hi There, There might be a solution for your problem, but it will require som more elaboration. Boost.Tokenizer is currently not the only option for this kind of job. There is also splitting facility incorporated in the StringAlgo library. You can find it in the CVS. So what's the difference, and what you can do. StringAlgo's facility is build around the concept called finder. Finder is something, that can search a string for some substring and return the location of it (represented by a pair of iterators) There is find_iterator facility. It allows you to iterate through the sequence over the substrings retrieved by a finder. There are two find iterators there. First one iterates over matching substrings, the second one over the gasps between them. So what you can do is to write a finder, that will skip comments and search for your delimiter. Then use split_iterator to do the tokenizing. Note, that you will need to load whole file into a string before processing, since all these facilities need at lease a forward iterator, so streambuf_iterator is not sufficient. For documentation you can check here: http://www.meta-comm.com/engineering/resources/cs-win32_metacomm/doc/html/st... HTH, Regards, Pavol Hello, Thursday, October 14, 2004, 7:46:49 PM, you wrote:
Hi,
I like to break a file into tokens for processing. The file contains comments which are introduced by "//", "#" and ";". Can I setup the tokenizer directly such that the comments are skipped? If no, what would you suggest to erase the comments from my string before processing?
Here is what I do right now:
// CODE ifstream is( "file.txt" );
string file, line; file.reserve( 2 * 1024 * 1024 ); while ( getline( is, line ) ) { TrimHead( line ); if ( line[0] != '/' && line[1] != '/' ) file.append( line + "\n" ); // Need to append "\n" again to get the right tokens - not very nice }
typedef tokenizer<char_separator<char> > Tokenizer; char_separator<char> sep(" \t\n"); Tokenizer tokens( file, sep ); // END CODE
Another idea was to the following:
// CODE ifstream is( "file.txt" );
string line( ( istreambuf_iterator<char>( is ) ), istreambuf_iterator<char>() ); EraseComments( line ); // END CODE
Any help is appreciated.
-Dirk
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

"Dirk Gregorius" <dirk@dirkgregorius.de> wrote in message news:004801c4b215$c88ba9b0$0202a8c0@master... Hi,
I like to break a file into tokens for processing. The file contains comments which are introduced by "//", "#" and ";". Can I setup the tokenizer directly such that the comments are skipped? If no, what would
you
suggest to erase the comments from my string before processing?
Since no one else has suggested these: IMO, this sounds more like an application for spirit or regex. In spirit you would do something approximating: // note this is untested but gives an idea of the // facilities available. std::vector<std::string> tokens; spirit::file_iterator<> first("input.dat"); spirit::file_iterator<> last(first.make_end()); spirit::rule<> rSkip = +space_p | lexeme_d[ comment_p("//") | comment_nest_p("/*","*/") ]; spirit::rule<> rToken = (*anychar_P)[push_back_a(tokens); // parse_info<> lResults = parse( first, last, *rToken , rSkip ); Certainly I think this is worth a look on your part. Jeff Flinn
participants (3)
-
Dirk Gregorius
-
Jeff Flinn
-
Pavol Droba