Re: [boost] Formal Review: Assignment Library

Tom Brinkman wrote:
The review of Assignment Library library, written by Thorsten Ottosen 2003 (nesotto@cs.auc.dk) starts today(April 1, 2004)and runs for 10 days.
I think the library should be accepted, though I have some suggestions. The review below is based on reading the documentation for about an hour. First off, I think such a library is usefull. Besides stl container initialization, it would be usefull for other libraries which want to provide convenient initialization syntax. Now, to the suggestions 1. The library expects the user to overload 'make_insertion' for his types. Now suppose I have 'class Email' which list of 'to' and 'cc' addresses.I want to initialize both using the assign library, e.g Email e; e.add_to("Winnie-the-Pooh")("Ia"); e.add_cc("Story writer"); The problem is that I can have only one 'make_insertion' for the 'Email' class. Exposing the real containers used for storing addresses is not good idea. And there might be 'add_to_address' function which does additional handling of argument, before pushing it into the container. So, it would be nice if the library allowed to specify a *function* which will do the actual insertion. 2. The library examples contain using namespace assignment; which is likely to burn users in a subtle way. Suppose: // some header namespace email { class Email {}; std::ostream& operator<<(std::ostream& os, const Email& e); } // some source #include "some header" namespace email { using namespace assignment; void internal_function() { vector<int> v; .... v << 1, 2, 3; } } The operator<< declared in the namespace will hide the one from "using namespace", and unless user is aware about this problem he might spend a lot of time trying to understand why the right operator<< is not selected. I know it costed me about 1 hour some time ago. 3. While the += operator is fine with me, I don't like the << operator. I cannot associate the conventional meaning of the operator with 'assign all' semantic, so I suggest that this operator is dropped completely. You can provide alternative syntax for assign_all: assign(v) = 1, 2, 3, 4, 5; if assignl(v)(1)(2)(3)(4)(5) has too many paranthethis. (Since comma has lower predecence than =, the first syntax will really work). 4. The operator() and other operators in insert_assigner are declared inline. Suppose vector<some_type> v; assign(v)("first", "")("second", ""); This involves call to constructor of 'some_type' and possible calls to additional constructors needed to converted char* into types that 'some_type' excepts. If operator() is inline, those calls will be inlined as well, increasing code size. Similiar problem exists because single-argument operator() accepts a fixed type, possible requiring convertion on the caller's side. I'd suggest the declare the operators outside of class declaration. This will give the compiler more freedom -- e.g. inlining only if maximum optimization level is requested. 4. Just like other revievers, I'm not comfortable with monolitic stl.hpp 5. Operator += which is declared for all possible types makes me nervous. IIRC, stl formatiing library by Terje Sletteb used some trick to avoid exposing all those operators. Something like template<template<class T, class Alloc> sequence> class helper { friend std::ostream& operator<<(std::ostream&, const Sequence& s) { ... } } // in vector.hpp template helper<std::vector>; Terje said the explicit instantination bring the friend declaration into namespace. I'm somewhat rusty on friend lookup rules, so further investigation is needed. Documentation issues 1. The example for maps uses: map<const char*, int> month; which is a bit strange, even for example, given that comparison for "const char*" does not compare string values 2. In the graph example, I'm not sure how you add edge between vertices 4 and 6, in a graph with only 5 vertices. If new vertices are automatically added, this should be documented. And BTW, what about graph which don't use ints for vertex_descriptor? 3. I did not really understoon 'tuple_insert_assigner' role. Why is it needed? - Volodya

Hi Vladimir, Thanks for your comments.
1. The library expects the user to overload 'make_insertion' for his types. Now suppose I have 'class Email' which list of 'to' and 'cc' addresses.I want to initialize both using the assign library, e.g
Email e; e.add_to("Winnie-the-Pooh")("Ia"); e.add_cc("Story writer");
The problem is that I can have only one 'make_insertion' for the 'Email' class. Exposing the real containers used for storing addresses is not good idea.
I not sure why a constructor taking both a 'to' and a 'cc' cannot work for you in this case. But I see that the requirement of having more than "insert" choice might be useful.
And there might be 'add_to_address' function which does additional handling of argument, before pushing it into the container. So, it would be nice if the library allowed to specify a *function* which will do the actual insertion.
So you would like something like Email e = insert_with< &Email::add_cc >( e )( ... ); ? Or perhaps Email e = insert_with( bind( e, &Email::add_cc ) )(...); ?
2. The library examples contain
using namespace assignment;
which is likely to burn users in a subtle way. Suppose: [snip example] The operator<< declared in the namespace will hide the one from "using namespace", and unless user is aware about this problem he might spend a lot of time trying to understand why the right operator<< is not selected. I know it costed me about 1 hour some time ago.
Thanks for pointing this out. I'll change 'using namespace' to a namespace alias and make a warning about this behavior in a footnote.
3. While the += operator is fine with me, I don't like the << operator. I cannot associate the conventional meaning of the operator with 'assign all' semantic, so I suggest that this operator is dropped completely. You can provide alternative syntax for assign_all:
assign(v) = 1, 2, 3, 4, 5;
So you mean assign_all( v ) = 1,2,3,4,5; Seems ok. I guess if there is agreement about removing operator<<, the name hiding issue goes away for that part (but not for operator+=())
I'd suggest the declare the operators outside of class declaration. This will give the compiler more freedom -- e.g. inlining only if maximum optimization level is requested.
I did not know there was a difference. Could you point to the place in the standard, please (in particular, I couldn't find anything in 8.3.5 and 9.5 that supports it).
Documentation issues
1. The example for maps uses:
map<const char*, int> month;
which is a bit strange, even for example, given that comparison for "const char*" does not compare string values
true. It is probably better not to teach programmers this.
2. In the graph example, I'm not sure how you add edge between vertices 4 and 6, in a graph with only 5 vertices. If new vertices are automatically added, this should be documented. And BTW, what about graph which don't use ints for vertex_descriptor?
I think you hit a soft spot. As has been discussed in the "Forced client complexity" thread, I stumpled into that complexity. The thing is that I would like to hear people's apinion about putting these things in this library or not. If they should be in this library, I would like individual library authors to help.
3. I did not really understoon 'tuple_insert_assigner' role. Why is it needed?
Basically because it does not call any constructor, but simply forwards a tuple of arguments to your callback function. (It could be explained better). This is how I call add_edge() in BGL even though add_edge() takes 3 and 4 arguments. br Thorsten

Hi Thorsten,
to initialize both using the assign library, e.g
Email e; e.add_to("Winnie-the-Pooh")("Ia"); e.add_cc("Story writer");
The problem is that I can have only one 'make_insertion' for the 'Email' class. Exposing the real containers used for storing addresses is not good idea.
I not sure why a constructor taking both a 'to' and a 'cc' cannot work for you in this case.
Because there's separate list of several 'to' addresses and a separate list of 'cc' addresses. Note that I don't have a need for 'Email' class yet, it's just for exposition.
But I see that the requirement of having more than "insert" choice might be useful.
Okay.
So you would like something like
Email e = insert_with< &Email::add_cc >( e )( ... );
? Or perhaps
Email e = insert_with( bind( e, &Email::add_cc ) )(...);
Right. Though for implementing 'add_to' method inside 'Email' class it will be necessary make insert_assigner have template parameter for a functor type. For example: insert_assigner_with_functor< function< void (string) > > add_cc(const string& s) { insert_assigner_with_functor< function< void(string> > > r( bind(this, &Email::add_cc)); r(s); return r; }
The operator<< declared in the namespace will hide the one from "using namespace", and unless user is aware about this problem he might spend a
lot
of time trying to understand why the right operator<< is not selected. I
know
it costed me about 1 hour some time ago.
Thanks for pointing this out. I'll change 'using namespace' to a namespace alias and make a warning about this behavior in a footnote.
In fact, using assignment::operator<< does not have this problem, though it's cumbersome.
3. While the += operator is fine with me, I don't like the << operator. I cannot associate the conventional meaning of the operator with 'assign
all'
semantic, so I suggest that this operator is dropped completely. You can provide alternative syntax for assign_all:
assign(v) = 1, 2, 3, 4, 5;
So you mean assign_all( v ) = 1,2,3,4,5;
Right, 'assign' was a typo.
Seems ok. I guess if there is agreement about removing operator<<, the name hiding issue goes away for that part (but not for operator+=())
Exactly. For operator+= the issue is less serious because that operator is overloaded less often.
I'd suggest the declare the operators outside of class declaration. This
will
give the compiler more freedom -- e.g. inlining only if maximum
optimization
level is requested.
I did not know there was a difference. Could you point to the place in the standard, please (in particular, I couldn't find anything in 8.3.5 and 9.5 that supports it).
9.3/2 says: A member function may be defined in its class definition, in which case it is an inline member function. So defining a function inside class has the same effect as "inline" specifier.
I think you hit a soft spot. As has been discussed in the "Forced client complexity" thread, I stumpled into that complexity. The thing is that I would like to hear people's apinion about putting these things in this library or not. If they should be in this library, I would like individual library authors to help.
I think for BGL this could be handy. Most of the time I prefer to create the graphs in BGL in graphviz format and parse them, but for introductionary examples it's too complex. Maybe we can stick to supporting only basic adjacency_list<> with integer vertices? In either case, it should be documented what happens on adding (4,6) edge in the example.
3. I did not really understoon 'tuple_insert_assigner' role. Why is it
needed?
Basically because it does not call any constructor, but simply forwards a tuple of arguments to your callback function. (It could be explained better). This is how I call add_edge() in BGL even though add_edge() takes 3 and 4 arguments.
Ok, let me try from a different perspective: when instances of this class are created? At least the documentation does not say they are created anywhere... - Volodya

"Vladimir Prus" <ghost@cs.msu.su> wrote in message news:200404081532.06583.ghost@cs.msu.su...
So you would like something like
Email e = insert_with< &Email::add_cc >( e )( ... );
? Or perhaps
Email e = insert_with( bind( e, &Email::add_cc ) )(...);
Right. Though for implementing 'add_to' method inside 'Email' class it will be necessary make insert_assigner have template parameter for a functor type. For example:
insert_assigner_with_functor< function< void (string) > > add_cc(const string& s) { insert_assigner_with_functor< function< void(string> > > r( bind(this, &Email::add_cc));
r(s); return r; }
This is something you could use in your program options, right? I'll have to think more about it, but wouldn't this work: insert_assigner< Email > add_cc( const string& s ) { return insert< Email::add_cc >( *this )( s ); } (given averything I haven't thought about works :-))
In fact,
using assignment::operator<<
does not have this problem, though it's cumbersome.
I'd suggest the declare the operators outside of class declaration. This
will
give the compiler more freedom -- e.g. inlining only if maximum
optimization
level is requested.
I did not know there was a difference. Could you point to the place in
it had with my compilers. the
standard, please (in particular, I couldn't find anything in 8.3.5 and 9.5 that supports it).
9.3/2 says:
A member function may be defined in its class definition, in which case it is an inline member function.
So defining a function inside class has the same effect as "inline" specifier.
true, but the compiler is not forced to inline it. So unless you are talking about requiring linking, I don't get it.
3. I did not really understoon 'tuple_insert_assigner' role. Why is it
needed?
Basically because it does not call any constructor, but simply forwards a tuple of arguments to your callback function. (It could be explained better). This is how I call add_edge() in BGL even though add_edge() takes 3 and 4 arguments.
Ok, let me try from a different perspective: when instances of this class are created? At least the documentation does not say they are created anywhere...
I agree it is not explained; when you call insert() for adjacency_list it will return such an instance. br Thorsten

Hi Thorsten,
insert_assigner_with_functor< function< void (string) > > add_cc(const string& s) { insert_assigner_with_functor< function< void(string> > > r( bind(this, &Email::add_cc));
r(s); return r; }
This is something you could use in your program options, right?
Yes, I'm thinking about that.
I'll have to think more about it, but wouldn't this work:
insert_assigner< Email > add_cc( const string& s ) { return insert< Email::add_cc >( *this )( s ); }
(given averything I haven't thought about works :-))
The insert_assigner< Email > still have to somehow store the functor... so it somehow should know about the type of the stored functor, either because the type is template parameter, or because insert_assigner stores a boost::function which knows the type. I've used boost::function in the previous post just because I can't guess the type of bind(this, &Email::add_cc) ;-)
In fact,
using assignment::operator<<
does not have this problem, though it's cumbersome.
it had with my compilers.
Hmm... strictly speaking, using assignment::operator<<; should make this operator because as if it's declared in namespace where "using" appears, as far as name hiding is concerned.
A member function may be defined in its class definition, in which case
it
is an inline member function.
So defining a function inside class has the same effect as "inline" specifier.
true, but the compiler is not forced to inline it. So unless you are talking about requiring linking, I don't get it.
The point is: making the function 'inline', either with explicit keyword or by placing it in the class body *increases* the chances that it will be inlined. And since inlining here can cause code bloat, it's better not to increase those chances. I've just sketched an example which can be found at http://zigzag.cs.msu.su:7813/inline There are two files -- one with in-class definition and one with out-of-class definition. Both are compiled with -O3 but the function is inlined only in the first example and the number of instructions needed to each call grows from 4 to 13. In a real example the difference might be smaller, but it also might be larger :-( Some time ago, I was rather shocked to find that declaring a single option in program_options took something about 1K of code size. So you'd get 100K for 100 options, which is plain unacceptable. I solved that problem by chaning some parameter types from std::string to char* to avoid implicit conversion. It was some time ago, so situation might have improved in gcc or in libstdc++, but generally, unnecessary inlining will still increase the code size.
Ok, let me try from a different perspective: when instances of this class are created? At least the documentation does not say they are created anywhere...
I agree it is not explained; when you call insert() for adjacency_list it will return such an instance.
Then I can only suggest that it's documented somehow. As it stands tuple_insert_assigner looks like a component which is totally unrelated to the rest of the library. - Volodya

Hi Vladimir, "Vladimir Prus" <ghost@cs.msu.su> wrote in message news:c53i71$4gs$1@sea.gmane.org...
In fact,
using assignment::operator<<
does not have this problem, though it's cumbersome.
it had with my compilers.
Hmm... strictly speaking,
using assignment::operator<<;
should make this operator because as if it's declared in namespace where "using" appears, as far as name hiding is concerned.
you're right. I had a global using declaration. I'm still a bit confused, though.
true, but the compiler is not forced to inline it. So unless you are
talking about requiring linking, I don't get it.
The point is: making the function 'inline', either with explicit keyword or by placing it in the class body *increases* the chances that it will be inlined. And since inlining here can cause code bloat, it's better not to increase those chances.
I agree I should investigate how my lib perform in this regard. I do think that we wan't all the inlining that we can get. For example, vector<int> v; v += 1,2; should preferably be expanded to vector<int> v; v.push_back( 1 ); v.push_back( 2 ); I don't see any benefit of another layer of function-calls (except larger code size:-)).
I've just sketched an example which can be found at
http://zigzag.cs.msu.su:7813/inline
There are two files -- one with in-class definition and one with out-of-class definition. Both are compiled with -O3 but the function is inlined only in the first example and the number of instructions needed to each call grows from 4 to 13. In a real example the difference might be smaller, but it also might be larger :-(
with vc7.1 and como4.3 the results are: -rwxrwxrwx 1 nesotto None 135168 Apr 9 00:12 cl_inline.exe -rwxrwxrwx 1 nesotto None 135168 Apr 9 00:08 cl_inline2.exe -rwxrwxrwx 1 nesotto None 950272 Apr 9 00:07 como_inline.exe -rwxrwxrwx 1 nesotto None 950272 Apr 9 00:08 como_inline2.exe
It was some time ago, so situation might have improved in gcc or in libstdc++, but generally, unnecessary inlining will still increase the code size.
perhaps; except when it decreases code size. :-) Arguable it would be good to see more what other compilers do; and I would like to see your code optimized for size (-O3 is for speed, right). br Thorsten

Hi Thorsten,
The point is: making the function 'inline', either with explicit keyword or by placing it in the class body *increases* the chances that it will be inlined. And since inlining here can cause code bloat, it's better not to increase those chances.
I agree I should investigate how my lib perform in this regard. I do think that we wan't all the inlining that we can get. For example,
vector<int> v; v += 1,2;
should preferably be expanded to
vector<int> v; v.push_back( 1 ); v.push_back( 2 );
I don't see any benefit of another layer of function-calls (except larger code size:-)).
Right, in this case inline expansion is ok. Though even if there's another layer of function call the only drawback would be a single instantination of that other function, plus extra time for argument passing.
I've just sketched an example which can be found at
http://zigzag.cs.msu.su:7813/inline
There are two files -- one with in-class definition and one with out-of-class definition. Both are compiled with -O3 but the function is inlined only in the first example and the number of instructions needed to each call grows from 4 to 13. In a real example the difference might be smaller, but it also might be larger :-(
with vc7.1 and como4.3 the results are:
-rwxrwxrwx 1 nesotto None 135168 Apr 9 00:12 cl_inline.exe -rwxrwxrwx 1 nesotto None 135168 Apr 9 00:08 cl_inline2.exe -rwxrwxrwx 1 nesotto None 950272 Apr 9 00:07 como_inline.exe -rwxrwxrwx 1 nesotto None 950272 Apr 9 00:08 como_inline2.exe
950K for a tiny program? Oh, anyway, I think this can have two explanations 1. Those compilers have different opinions about inlining (which does not disprove the statement that 'inline' increases chances for inlining). 2. The binary size is rounded to some 'page' boundary
It was some time ago, so situation might have improved in gcc or in libstdc++, but generally, unnecessary inlining will still increase the code size.
perhaps; except when it decreases code size. :-) Arguable it would be good to see more what other compilers do; and I would like to see your code optimized for size (-O3 is for speed, right).
It would be also good to have tests with 1 call, 10 calls, 20 calls and so on, so that we can draw nice plot, instead of computing the number of assembly instructions ;-) Ok, I've wrote a script to measure that, please see the results in http://zigzag.cs.msu.su:7813/inline/data.pdf and data for optimization for space (-Os) in in http://zigzag.cs.msu.su:7813/inline/data_space_optimization.cvs (and is not much different from -03) - Volodya
participants (2)
-
Thorsten Ottosen
-
Vladimir Prus