Using latest boost version 39 on gcc 4.2.1 on SUSE linux
I am trying to load a file, and then split the lines into a vector of strings.
However when this is run, it showed that the last string was corrupt.
When I ran this with valgrind, the very first error shows an invalid read of size 1, in guts of boost::char_separator.
==19963== Invalid read of size 1
==19963== at 0x8056870: bool boost::char_separator<char, std::char_traits<char> >::operator()<__gnu_cxx::__normal_iterator<char const*, std::string>, std::string>(__gnu_cxx::__normal_iterator<char const*, std::string>&, __gnu_cxx::__normal_iterator<char const*, std::string>, std::string&) (token_functions.hpp:430)
==19963== by 0x8056E3E: boost::token_iterator<boost::char_separator<char, std::char_traits<char> >, __gnu_cxx::__normal_iterator<char const*, std::string>, std::string>::initialize() (token_iterator.hpp:70)
==19963== by 0x8056EA6: boost::token_iterator<boost::char_separator<char, std::char_traits<char> >, __gnu_cxx::__normal_iterator<char const*, std::string>, std::string>::token_iterator(boost::char_separator<char, std::char_traits<char> >, __gnu_cxx::__normal_iterator<char const*, std::string>, __gnu_cxx::__normal_iterator<char const*, std::string>) (token_iterator.hpp:77)
==19963== by 0x8056FAA: boost::tokenizer<boost::char_separator<char, std::char_traits<char> >, __gnu_cxx::__normal_iterator<char const*, std::string>, std::string>::begin() const (tokenizer.hpp:86)
here is program:
BOOST_AUTO_TEST_CASE( test_log_append ) {
string logFile = "test/logfile.txt";
// Load the log file into a vector, of strings, and test content
ifstream ifs(logFile.c_str());
BOOST_REQUIRE_MESSAGE(ifs, "Could not open log file\n");
stringstream ss; ss << ifs.rdbuf(); // Read the whole file into a string
char_separator<char> sep("\n"); // Split the file content unix=\n pc =\n\r
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
tokenizer tokens(ss.str(), sep); // <<<<<<<< valgrind barfs here
std::vector<std::string> lines; lines.reserve(9);
std::copy(tokens.begin(), tokens.end(), back_inserter(lines)); // <<<<<<<< valgrind barfs here
for(int i = 0; i < lines.size(); i++) { cerr << "'" << lines[i] << "'\n"; }
}
the input in the logfile.txt is of the form:
MSG:[16:36:09 14.7.2009] First Message
LOG:[16:36:09 14.7.2009] LOG
WAR:[16:36:09 14.7.2009] ERROR
ERR:[16:36:09 14.7.2009] WARNING
DBG:[16:36:09 14.7.2009] DEBUG
OTH:[16:36:09 14.7.2009] OTHER
OTH:[16:36:09 14.7.2009] OTHER2
MSG:[16:36:09 14.7.2009] Last Message
The output is of the form:
'MSG:[16:36:09 14.7.2009] First Message'
'LOG:[16:36:09 14.7.2009] LOG'
'WAR:[16:36:09 14.7.2009] ERROR'
'ERR:[16:36:09 14.7.2009] WARNING'
'DBG:[16:36:09 14.7.2009] DEBUG'
'OTH:[16:36:09 14.7.2009] OTHER'
'OTH:[16:36:09 14.7.2009] OTHER2'
'�:[16:36:09 14.7.2009] Last Message'
Notice that the last string is corrupt.
Is the tokenizer known to be buggy in boost 1.39, or am I doing it all wrong ?
Best regards,
Ta,
Avi