vc71 - tokenizer not working with static runtime and unicode?

Following problem has been reported on boost users list. If following code is compiled with compiler option /MD (dynamically link to runtime library) it works, but when compiled without it (statically linked to runtime), it produces no output.
code begin #include <string> #include <iostream> #include <boost/tokenizer.hpp>
int main() { typedef boost::tokenizer<boost::char_separator<std::wstring::value_type>, std::wstring::const_iterator, std::wstring> MyTokenizer; const boost::char_separator<std::wstring::value_type> sep(L"a"); MyTokenizer token(std::wstring(L"abacadaeafag"), sep); for (MyTokenizer::const_iterator it = token.begin(); it != token.end(); ++it) std::wcout << *it << std::endl; }
code end
I suppose there might be some problem with vc71 static runtime library, but maybe you have other explanation? B.

Bronek Kozicki <brok@rubikon.pl> wrote:
I suppose there might be some problem with vc71 static runtime library, but maybe you have other explanation?
OK, I know the reason. Sorry for bothering you. We have unusual manifestation of undefined behaviour here - tokenizer should not be initialized with temporary string. It does not make a copy of its string argument, instead it stores only begin and end iterators. Of course these iterators are no longer valid once temporary string variable is destroyed. Program could crash, but it can do anything else when dereferencing such dangling iterator. B.

At 06:12 AM 2/22/2004, Bronek Kozicki wrote:
Bronek Kozicki <brok@rubikon.pl> wrote:
I suppose there might be some problem with vc71 static runtime library, but maybe you have other explanation?
OK, I know the reason. Sorry for bothering you. We have unusual manifestation of undefined behaviour here - tokenizer should not be initialized with temporary string. It does not make a copy of its string argument, instead it stores only begin and end iterators. Of course these iterators are no longer valid once temporary string variable is destroyed. Program could crash, but it can do anything else when dereferencing such dangling iterator.
Ouch! That seems an unfortunate design decision. Perhaps two forms of constructors could have been provided - one form that takes two iterators and doesn't store a copy of the contents, and another form like the current constructors which takes a container reference but (unlike the current constructor) makes a copy. I wonder if that was discussed at the time? Or is there a better way to prevent the problem? --Beman

"Beman Dawes" <bdawes@acm.org> wrote in message news:4.3.2.7.2.20040222214057.031e7ab0@mailhost.esva.net...
At 06:12 AM 2/22/2004, Bronek Kozicki wrote:
[snip]
Ouch! That seems an unfortunate design decision.
I agree.
Perhaps two forms of constructors could have been provided - one form that takes two iterators and doesn't store a copy of the contents, and another form like the current constructors which takes a container reference but (unlike the current constructor) makes a copy. I wonder if that was discussed at the time? Or is there a better way to prevent the problem?
Why not change the existing constructor to template<class Container> tokenizer( /*const*/ Container& c,const TokenizerFunc& f = TokenizerFunc()) ? br Thorsten

On Mon, 23 Feb 2004 19:17:08 +1100, Thorsten Ottosen wrote:
Why not change the existing constructor to
template<class Container> tokenizer( /*const*/ Container& c,const TokenizerFunc& f = TokenizerFunc())
you mean to comment out const-ness ? This would break following sample (an I believe quite typical) tokenizer usage: void parse(const std::string& a) { // typedef of Tokenizer, definition of separator s goes here Tokenizer t(a, s); // etc. } B.

"Bronek Kozicki" <brok@rubikon.pl> wrote in message news:17harepl17tae$.1vurpce0boti4$.dlg@40tude.net...
On Mon, 23 Feb 2004 19:17:08 +1100, Thorsten Ottosen wrote:
Why not change the existing constructor to
template<class Container> tokenizer( /*const*/ Container& c,const TokenizerFunc& f = TokenizerFunc())
you mean to comment out const-ness ? This would break following sample (an I believe quite typical) tokenizer usage: void parse(const std::string& a) { // typedef of Tokenizer, definition of separator s goes here Tokenizer t(a, s); // etc. }
Sure; the inefficiency of the copy-taking constructor should just be well documented then. Your proposed change would probably not break code, but it could make it much slower, silently. br Thorsten

On Sun, 22 Feb 2004 21:50:34 -0500, Beman Dawes wrote:
Ouch! That seems an unfortunate design decision. Perhaps two forms of constructors could have been provided - one form that takes two iterators and doesn't store a copy of the contents, and another form like the current constructors which takes a container reference but (unlike the current constructor) makes a copy.
Actually, as I can see, both forms are currently provided, and both work the same way (store begin and end iterators). I think that proposed change will not break tokenizer. Few usage scenarios comes to my mind: 1. tokenizer initilized with temporary - currently undefined behaviour, will be well defined; 2. tokenizer initialized with const reference - will make extra copy of input data; 3. like 2, afterwards this reference is modified (ouside tokenizer, of course) - undefined behaviour or well defined behaviour, depending if iterators taken by tokenizer constructor are still valid. Here most important change of behaviour will be seen (collection used by tokenizer won't see changes of input data). 4. tokenizer initialized with iterators - will work exactly the same way as currently. 5. assign from iterators - will work exactly the same way as currently 6. assign from const reference - will make extra copy of input data. Again changed of class behaviour - any change on input data won't be seen by tokenizer. 7. assignment from temporary - currently undefined behaviour, will be well defined. Iterator-based constructor is clearly expressing idea that tokenizer is *iterating* over given collection (input string). Also (IMVHO) expected behaviour when initializing from const-reference is to make a copy, as shown by recent posts on boost-users list. I think that proposed change will make tokenizer behaviour closer to user expectations. Another option (not really better one) is to add extra bool parameter to constructor, default = false, to express if extra copy of data should be made.
I wonder if that was discussed at the time? Or is there a better way to prevent the problem?
I remember thread "Perfect Forwarding" where David presented clever way to detect copy-from--temporary-rvalue, but I do not know if any of his ideas are applicable here. It would be sooo much cleaner just to add constructor like: template <typename Container> tokenizer(const Container&& c, const TokenizerFunc& f); // make a copy unfortunatelly proposal to add "&&" syntax (N1377) is still under Committee consideration ... B.

Bronek Kozicki <brok@rubikon.pl> writes:
I wonder if that was discussed at the time? Or is there a better way to prevent the problem?
I remember thread "Perfect Forwarding" where David presented clever way to detect copy-from--temporary-rvalue, but I do not know if any of his ideas are applicable here. It would be sooo much cleaner just to add constructor like:
template <typename Container> tokenizer(const Container&& c, const TokenizerFunc& f); // make a copy
Actually, if that's the way tokenizer works, you can do it, I'm pretty sure: --- #include <vector> #include <iostream> template <class TokenizerFunc> struct tokenizer { struct rvalue { template <class T> rvalue(T const&) {} }; // handle lvalues template <typename Container> explicit tokenizer(Container& c, const TokenizerFunc& f = TokenizerFunc()) { std::cout << "lvalue\n"; } // handle rvalues explicit tokenizer(rvalue const& rval, const TokenizerFunc& f = TokenizerFunc()) { std::cout << "rvalue\n"; } }; int main() { std::vector<int> const x; tokenizer<int> z((x)); tokenizer<int> z2((std::vector<int>())); } -- Dave Abrahams Boost Consulting www.boost-consulting.com

At 04:12 AM 2/24/2004, David Abrahams wrote:
Actually, if that's the way tokenizer works, you can do it, I'm pretty sure:
---
I changed main() slightly (see below) and tried with several compilers: g++ 3.3.1: lvalue.cpp: In function `int main()': lvalue.cpp:32: error: syntax error before `)' token bcc32: error E2188 lvalue.cpp 32: Expression syntax in function main() error E2293 lvalue.cpp 32: ) expected in function main() Codewarrior 8.3: lvalue rvalue lvalue rvalue Intel 8.0 and VC++ 7.1: lvalue rvalue lvalue Note 4th output line missing for Intel 8.0 and VC++ 7.1 What's going on here? Mystified-in-Virginia, --Beman #include <vector> #include <iostream> template <class TokenizerFunc> struct tokenizer { struct rvalue { template <class T> rvalue(T const&) {} }; // handle lvalues template <typename Container> explicit tokenizer(Container& c, const TokenizerFunc& f = TokenizerFunc()) { std::cout << "lvalue\n"; } // handle rvalues explicit tokenizer(rvalue const& rval, const TokenizerFunc& f = TokenizerFunc()) { std::cout << "rvalue\n"; } }; int main() { std::vector<int> const x; tokenizer<int> z( (x) ); tokenizer<int> z2( (std::vector<int>()) ); tokenizer<int> z3( x ); tokenizer<int> z4( std::vector<int>() ); return 0; }

Beman Dawes wrote:
Note 4th output line missing for Intel 8.0 and VC++ 7.1 What's going on here?
[...]
tokenizer<int> z4( std::vector<int>() );
The usual function declaration? tokenizer<int> z4( std::vector<int> f() );
participants (5)
-
Beman Dawes
-
Bronek Kozicki
-
David Abrahams
-
Peter Dimov
-
Thorsten Ottosen