Re: [Boost-users] Tokenizer: BCB AnsiString and std:string

6 Oct 2005

      CN wrote:
...
Hi!
The following version incorrectly produces the result:
< > 1
//----------------------
#include<iostream>
#include<boost/tokenizer.hpp>
#include <vcl.h>
#pragma hdrstop
#pragma argsused
int main(int argc, char* argv[])
{
  using namespace std;
  using namespace boost;
  AnsiString As = "*";
  typedef tokenizer<boost::char_separator<char> > Tok;
  char_separator<char> sep(", ");
  Tok tokens(string(As.c_str(),As.Length()), sep);
  for(Tok::iterator tok_iter = tokens.begin();tok_iter != tokens.end();
      ++tok_iter)
    cout << "<" << *tok_iter << "> " << (*tok_iter).size();
  return EXIT_SUCCESS;
}
//----------------------
However, this version correctly outputs this result:
<*> 1
#include<iostream>
#include<boost/tokenizer.hpp>
#include <vcl.h>
#pragma hdrstop
#pragma argsused
int main(int argc, char* argv[])
{
  using namespace std;
  using namespace boost;
  AnsiString As = "*";
  string str=string(As.c_str(),As.Length());
  typedef tokenizer<boost::char_separator<char> > Tok;
  char_separator<char> sep(", ");
  Tok tokens(str, sep);
  for(Tok::iterator tok_iter = tokens.begin();tok_iter != tokens.end();
      ++tok_iter)
    cout << "<" << *tok_iter << "> " << (*tok_iter).size();
  return EXIT_SUCCESS;
}
//----------------------
Why the firt version mulfunctions?
TIA and Regards,
CN
The first version creates a temporary string that only exists for long 
enough to call tokens' constructor, and a tokenizer keeps pointers into 
the string it was built on, not a list of the tokens (for space 
reasons), so immediately after tokens is constructed, it has a list of 
pointers to a dead variable (this is a Bad Thing, and you are lucky this 
didn't work the first time, rather than silently stop working later). 
The second version has a real variable holding the string, and therefore 
the pointers are still alive.

Why don't you just use 'As' directly?  Does AnsiString not provide 
begin()/end()?  If it's just that it's versions are differently named 
you can do:

Tok tokens(As.Begin(), As.End(), sep);// Or whatever begin()/end() are

PS: It's conventional to name variables with a lower-case first letter.

Re: [Boost-users] Tokenizer: BCB AnsiString and std:string

Simon Buchan