vc71 - tokenizer not working with static runtime and unicode? - Boost

newer
Lambda patch to make it recognize...

vc71 - tokenizer not working with static runtime and unicode?

Bronek Kozicki

21 Feb 2004 21 Feb '04

8:54 p.m.

Following problem has been reported on boost users list. If following code is compiled with compiler option /MD (dynamically link to runtime library) it works, but when compiled without it (statically linked to runtime), it produces no output.

...

...
...
...
code begin #include <string> #include <iostream> #include <boost/tokenizer.hpp>

int main() { typedef boost::tokenizer<boost::char_separator<std::wstring::value_type>, std::wstring::const_iterator, std::wstring> MyTokenizer; const boost::char_separator<std::wstring::value_type> sep(L"a"); MyTokenizer token(std::wstring(L"abacadaeafag"), sep); for (MyTokenizer::const_iterator it = token.begin(); it != token.end(); ++it) std::wcout << *it << std::endl; }

...

...
...
...
code end

I suppose there might be some problem with vc71 static runtime library, but maybe you have other explanation? B.

Show replies by date

Bronek Kozicki

22 Feb 22 Feb

11:12 a.m.

New subject: vc71 - tokenizer not working with static runtime and unicode?

Bronek Kozicki <brok@rubikon.pl> wrote:

...

I suppose there might be some problem with vc71 static runtime library, but maybe you have other explanation?

OK, I know the reason. Sorry for bothering you. We have unusual manifestation of undefined behaviour here - tokenizer should not be initialized with temporary string. It does not make a copy of its string argument, instead it stores only begin and end iterators. Of course these iterators are no longer valid once temporary string variable is destroyed. Program could crash, but it can do anything else when dereferencing such dangling iterator. B.

Beman Dawes

23 Feb 23 Feb

2:50 a.m.

New subject: vc71 - tokenizer not working with static runtime and unicode?

At 06:12 AM 2/22/2004, Bronek Kozicki wrote:

...

Bronek Kozicki <brok@rubikon.pl> wrote:

...
I suppose there might be some problem with vc71 static runtime library, but maybe you have other explanation?

OK, I know the reason. Sorry for bothering you. We have unusual manifestation of undefined behaviour here - tokenizer should not be initialized with temporary string. It does not make a copy of its string argument, instead it stores only begin and end iterators. Of course these iterators are no longer valid once temporary string variable is destroyed. Program could crash, but it can do anything else when dereferencing such dangling iterator.

Ouch! That seems an unfortunate design decision. Perhaps two forms of constructors could have been provided - one form that takes two iterators and doesn't store a copy of the contents, and another form like the current constructors which takes a container reference but (unlike the current constructor) makes a copy. I wonder if that was discussed at the time? Or is there a better way to prevent the problem? --Beman

Thorsten Ottosen

8:17 a.m.

New subject: vc71 - tokenizer not working with static runtimeand unicode?

"Beman Dawes" <bdawes@acm.org> wrote in message news:4.3.2.7.2.20040222214057.031e7ab0@mailhost.esva.net...

...

At 06:12 AM 2/22/2004, Bronek Kozicki wrote:

[snip]

...

Ouch! That seems an unfortunate design decision.

I agree.

...

Perhaps two forms of constructors could have been provided - one form that takes two iterators and doesn't store a copy of the contents, and another form like the current constructors which takes a container reference but (unlike the current constructor) makes a copy. I wonder if that was discussed at the time? Or is there a better way to prevent the problem?

Why not change the existing constructor to template<class Container> tokenizer( /*const*/ Container& c,const TokenizerFunc& f = TokenizerFunc()) ? br Thorsten

Bronek Kozicki

10:59 a.m.

New subject: vc71 - tokenizer not working with static runtimeand unicode?

On Mon, 23 Feb 2004 19:17:08 +1100, Thorsten Ottosen wrote:

...

Why not change the existing constructor to

template<class Container> tokenizer( /*const*/ Container& c,const TokenizerFunc& f = TokenizerFunc())

you mean to comment out const-ness ? This would break following sample (an I believe quite typical) tokenizer usage: void parse(const std::string& a) { // typedef of Tokenizer, definition of separator s goes here Tokenizer t(a, s); // etc. } B.

Thorsten Ottosen

2:35 p.m.

New subject: vc71 - tokenizer not working with static runtimeand unicode?

"Bronek Kozicki" <brok@rubikon.pl> wrote in message news:17harepl17tae$.1vurpce0boti4$.dlg@40tude.net...

...

On Mon, 23 Feb 2004 19:17:08 +1100, Thorsten Ottosen wrote:

...
Why not change the existing constructor to

template<class Container> tokenizer( /*const*/ Container& c,const TokenizerFunc& f = TokenizerFunc())

you mean to comment out const-ness ? This would break following sample (an I believe quite typical) tokenizer usage: void parse(const std::string& a) { // typedef of Tokenizer, definition of separator s goes here Tokenizer t(a, s); // etc. }

Sure; the inefficiency of the copy-taking constructor should just be well documented then. Your proposed change would probably not break code, but it could make it much slower, silently. br Thorsten

Bronek Kozicki

1:02 p.m.

New subject: vc71 - tokenizer not working with static runtime and unicode?

On Sun, 22 Feb 2004 21:50:34 -0500, Beman Dawes wrote:

...

Ouch! That seems an unfortunate design decision. Perhaps two forms of constructors could have been provided - one form that takes two iterators and doesn't store a copy of the contents, and another form like the current constructors which takes a container reference but (unlike the current constructor) makes a copy.

Actually, as I can see, both forms are currently provided, and both work the same way (store begin and end iterators). I think that proposed change will not break tokenizer. Few usage scenarios comes to my mind: 1. tokenizer initilized with temporary - currently undefined behaviour, will be well defined; 2. tokenizer initialized with const reference - will make extra copy of input data; 3. like 2, afterwards this reference is modified (ouside tokenizer, of course) - undefined behaviour or well defined behaviour, depending if iterators taken by tokenizer constructor are still valid. Here most important change of behaviour will be seen (collection used by tokenizer won't see changes of input data). 4. tokenizer initialized with iterators - will work exactly the same way as currently. 5. assign from iterators - will work exactly the same way as currently 6. assign from const reference - will make extra copy of input data. Again changed of class behaviour - any change on input data won't be seen by tokenizer. 7. assignment from temporary - currently undefined behaviour, will be well defined. Iterator-based constructor is clearly expressing idea that tokenizer is *iterating* over given collection (input string). Also (IMVHO) expected behaviour when initializing from const-reference is to make a copy, as shown by recent posts on boost-users list. I think that proposed change will make tokenizer behaviour closer to user expectations. Another option (not really better one) is to add extra bool parameter to constructor, default = false, to express if extra copy of data should be made.

...

I wonder if that was discussed at the time? Or is there a better way to prevent the problem?

I remember thread "Perfect Forwarding" where David presented clever way to detect copy-from--temporary-rvalue, but I do not know if any of his ideas are applicable here. It would be sooo much cleaner just to add constructor like: template <typename Container> tokenizer(const Container&& c, const TokenizerFunc& f); // make a copy unfortunatelly proposal to add "&&" syntax (N1377) is still under Committee consideration ... B.

David Abrahams

24 Feb 24 Feb

9:12 a.m.

New subject: vc71 - tokenizer not working with static runtime and unicode?

Bronek Kozicki <brok@rubikon.pl> writes:

...

...
I wonder if that was discussed at the time? Or is there a better way to prevent the problem?

I remember thread "Perfect Forwarding" where David presented clever way to detect copy-from--temporary-rvalue, but I do not know if any of his ideas are applicable here. It would be sooo much cleaner just to add constructor like:

template <typename Container> tokenizer(const Container&& c, const TokenizerFunc& f); // make a copy

Actually, if that's the way tokenizer works, you can do it, I'm pretty sure: --- #include <vector> #include <iostream> template <class TokenizerFunc> struct tokenizer { struct rvalue { template <class T> rvalue(T const&) {} }; // handle lvalues template <typename Container> explicit tokenizer(Container& c, const TokenizerFunc& f = TokenizerFunc()) { std::cout << "lvalue\n"; } // handle rvalues explicit tokenizer(rvalue const& rval, const TokenizerFunc& f = TokenizerFunc()) { std::cout << "rvalue\n"; } }; int main() { std::vector<int> const x; tokenizer<int> z((x)); tokenizer<int> z2((std::vector<int>())); } -- Dave Abrahams Boost Consulting www.boost-consulting.com

Beman Dawes

2:49 p.m.

New subject: vc71 - tokenizer not working with static runtime and unicode?

At 04:12 AM 2/24/2004, David Abrahams wrote:

...

Actually, if that's the way tokenizer works, you can do it, I'm pretty sure:

---

I changed main() slightly (see below) and tried with several compilers: g++ 3.3.1: lvalue.cpp: In function `int main()': lvalue.cpp:32: error: syntax error before `)' token bcc32: error E2188 lvalue.cpp 32: Expression syntax in function main() error E2293 lvalue.cpp 32: ) expected in function main() Codewarrior 8.3: lvalue rvalue lvalue rvalue Intel 8.0 and VC++ 7.1: lvalue rvalue lvalue Note 4th output line missing for Intel 8.0 and VC++ 7.1 What's going on here? Mystified-in-Virginia, --Beman #include <vector> #include <iostream> template <class TokenizerFunc> struct tokenizer { struct rvalue { template <class T> rvalue(T const&) {} }; // handle lvalues template <typename Container> explicit tokenizer(Container& c, const TokenizerFunc& f = TokenizerFunc()) { std::cout << "lvalue\n"; } // handle rvalues explicit tokenizer(rvalue const& rval, const TokenizerFunc& f = TokenizerFunc()) { std::cout << "rvalue\n"; } }; int main() { std::vector<int> const x; tokenizer<int> z( (x) ); tokenizer<int> z2( (std::vector<int>()) ); tokenizer<int> z3( x ); tokenizer<int> z4( std::vector<int>() ); return 0; }

Peter Dimov

5:21 p.m.

New subject: vc71 - tokenizer not working with staticruntime and unicode?

Beman Dawes wrote:

...

Note 4th output line missing for Intel 8.0 and VC++ 7.1 What's going on here?

[...]

...

tokenizer<int> z4( std::vector<int>() );

The usual function declaration? tokenizer<int> z4( std::vector<int> f() );

7805

Age (days ago)

7808

Last active (days ago)

List overview

Download

9 comments

5 participants

participants (5)

Beman Dawes
Bronek Kozicki
David Abrahams
Peter Dimov
Thorsten Ottosen