
Hello everybody, Recently there was a discussion about a typeof emulation library. The library determines the type of an expression by passing it to a function template, encoding the type as a list of integers, "returning" them one-by-one via sizeof(), and decoding the type. One of the main problems of any such emulation is the need to associate types and templates with unique compile-time integral identifiers. This becomes even bigger problem, when expressions can contain types/templates from different libraries. It's not quite clear how to ensure uniqueness of identifiers in this case (GUID would be an option, but it's far too big to fit into an integer). An attempt was made to generate the identifiers automatically. However, since mechanisms to achieve this always work inside only one compilation unit, there is no way to preserve the same identifiers for the same entities throughout different compilation units. Hence, inevitably, we run into the ODR violation. An attempt to work this around by defining stuff in anonimous namespace, forces everything that uses this facility to also be defined in the anonimous namespace or, again, the ODR is violated. Although compilers don't seem to care much about ODR in this particular case (meta-functions only, no runtime constructs), somehow I feel uneasy to proceed with the solution that does not quite comply to the standard. And so, once again, I tried to come-up with a registration scheme that, on the one hand, would not violate the ODR, and on the other hand, would not put a significant burden on the end user -- the crutial condition for this library to be of any use at all. Here is this scheme: The typeof library registers primitive types, as well as some other standard stuff, like functions, pointers, references, arrays, consts, etc. If the user only wants any combination of those, no additional actions is required, such as the following type: int& (*)(char[20], double&) would be handled. If, in addition to this, the user wants to handle types from other libraries, such as Spirit, Lambda, etc., it works as following: - The typeof library #defines a preprocessor constant, currently called BOOST_TYPEOF_USER_GROUP, which is the next identifier after the last one it actually used. - The Spirit library registers its types against some symbolic identifiers. Only one per file is required (inside the same file __LINE__ does the job). Something looking similarly to the include guard seems like a decent choice. In addition, Spirit defines a separate header where it enumerates all these symbolic identifiers. This header looks something like this: <boost/spirit/groups.hpp> #ifndef SPIRIT_GROUPS_HPP_INCLUDED #define SPIRIT_GROUPS_HPP_INCLUDED #include <boost/spirit/typeof/typeof.hpp> enum { SPIRIT_REGISTER_HPP = BOOST_TYPEOF_USER_GROUP, SPIRIT_LAST }; #undef BOOST_TYPEOF_USER_GROUP #define BOOST_TYPEOF_USER_GROUP SPIRIT_LAST #endif//SPIRIT_GROUPS_HPP_INCLUDED IOW, the Spirit identifiers start where the typeof library identifiers ended, and BOOST_TYPEOF_USER_GROUP is, again, the next available. - The Lambda library does the same, etc. - The user combines all the groups in a single include, effectively chaining the enums: #ifndef GROUPS_HPP_INCLUDED #define GROUPS_HPP_INCLUDED #include <boost/spirit/groups.hpp> #include <boost/lambda/groups.hpp> #endif//GROUPS_HPP_INCLUDED - The user includes this file before any registration file provided by these libraries. Since the file contains only enums, the compile-time penalty is minimal. This insures that identifiers remain fixed across compilation units. - Whenever the user needs to register her own classes, she provides identifiers by any means she prefers, such as enum, etc., starting from BOOST_TYPEOF_USER_GROUP. See http://groups.yahoo.com/group/boost/files/typeof.zip (or Spirit repository, TYPEOF_DEV branch) for how it all works together. I appologize for such a long post (if you are still here :) ) Any comments? Regards, Arkadiy

"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
- Whenever the user needs to register her own classes, she provides identifiers by any means she prefers, such as enum, etc., starting from BOOST_TYPEOF_USER_GROUP.
How will multiple independent users, or libraries that use boost, avoid collisions? IMO it's better to see what we can get away with w.r.t the ODR. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

"David Abrahams" <dave@boost-consulting.com> wrote
"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
- Whenever the user needs to register her own classes, she provides identifiers by any means she prefers, such as enum, etc., starting from BOOST_TYPEOF_USER_GROUP.
How will multiple independent users, or libraries that use boost, avoid collisions?
If the class is registered in a header, it should be registered against a symbol somehow produced from the name of this header (in the similar way to include guards). If the author of such classes wants to be independent, she supplies a separate header, where her symbols are enumerated, looking somehing like: enum { MY_SYMBOL_1 = BOOST_TYPEOF_LAST, MY_SYMBOL_2, ... MY_SYMBOL_LAST }; #undef BOOST_TYPEOF_LAST #define BOOST_TYPEOF_LAST MY_SYMBOL_LAST ultimately, at some point, somebody is writing a CPP file. This end user has to collect all such headers, and "chain" them in a single header, thus emulating a single enum. Arkadiy

"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
"David Abrahams" <dave@boost-consulting.com> wrote
"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
- Whenever the user needs to register her own classes, she provides identifiers by any means she prefers, such as enum, etc., starting from BOOST_TYPEOF_USER_GROUP.
How will multiple independent users, or libraries that use boost, avoid collisions?
If the class is registered in a header, it should be registered against a symbol somehow produced from the name of this header (in the similar way to include guards). If the author of such classes wants to be independent, she supplies a separate header, where her symbols are enumerated, looking somehing like:
enum { MY_SYMBOL_1 = BOOST_TYPEOF_LAST, MY_SYMBOL_2, ... MY_SYMBOL_LAST }; #undef BOOST_TYPEOF_LAST #define BOOST_TYPEOF_LAST MY_SYMBOL_LAST
ultimately, at some point, somebody is writing a CPP file. This end user has to collect all such headers, and "chain" them in a single header, thus emulating a single enum.
That sounds like a serious usability problem to me. Furthermore, we'll still have an ODR violation if two separately-compiled libraries do the chaining in different ways. Am I wrong? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
ultimately, at some point, somebody is writing a CPP file. This end user has to collect all such headers, and "chain" them in a single header,
"David Abrahams" <dave@boost-consulting.com> wrote thus
emulating a single enum.
That sounds like a serious usability problem to me. Furthermore, we'll still have an ODR violation if two separately-compiled libraries do the chaining in different ways. Am I wrong?
Right, but it does provide the ability to create the ODR-complient code (even if only inside one module) by applying certain discipline... And I don't quite agree that this discipline is too hard to follow. People use system-wide enums left and right... and in many cases this is just classes from a couple of libraries that the user really needs. As for separately-compiled libraries, do you think such a library could have any remaining traces of the template instantiations we are discussing? Futhermore, without the discipline, provided people don't care about ODR, it can be made almost as easy as automatic registration. For example the registration headers can directly #include the "enum"-headers, so that if they wern't previously chained, they chain in a random way. Even automatic ID-generation faciities can be provided in addition, for end-users who do not care about ODR. But, as it was said here yesturday, I think we should not punish "I know what I am doing" group. Let's at least provide the ability to create the ODR-complient code... Arkadiy

"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
"David Abrahams" <dave@boost-consulting.com> wrote
ultimately, at some point, somebody is writing a CPP file. This end user has to collect all such headers, and "chain" them in a single header,
"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes: thus
emulating a single enum.
That sounds like a serious usability problem to me. Furthermore, we'll still have an ODR violation if two separately-compiled libraries do the chaining in different ways. Am I wrong?
Right, but it does provide the ability to create the ODR-complient code (even if only inside one module) by applying certain discipline... And I don't quite agree that this discipline is too hard to follow. People use system-wide enums left and right...
Yes, but there's seldom a requirement that they be unique system-wide while at the same time being distributed across headers.
and in many cases this is just classes from a couple of libraries that the user really needs.
As for separately-compiled libraries, do you think such a library could have any remaining traces of the template instantiations we are discussing?
If not, the original ODR problem is a total non-issue from a practical POV so you might as well go ahead with automatic ID generation.
Futhermore, without the discipline, provided people don't care about ODR, it can be made almost as easy as automatic registration. For example the registration headers can directly #include the "enum"-headers, so that if they wern't previously chained, they chain in a random way. Even automatic ID-generation faciities can be provided in addition, for end-users who do not care about ODR.
OK.
But, as it was said here yesturday, I think we should not punish "I know what I am doing" group. Let's at least provide the ability to create the ODR-complient code...
If the ODR matters in this case, we have to do it. Otherwise, I don't care one way or the other. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

"David Abrahams" <dave@boost-consulting.com> wrote
If not, the original ODR problem is a total non-issue from a practical POV so you might as well go ahead with automatic ID generation.
Isn't it possible that compiler keeps some information in memory as it compiles different units? So that, when it sees a class second time, it detects the ODR violation, and does whatever it desides appropriate to do (maybe just ignore, and use the first one, current one, one randomly selected from whatever it had so far, etc.). But I can hardly imagine that the compiler would persist the information about such class in the object module... And this would be the only way for the linker to get it noticed at all for the separately compiled libraries...
If the ODR matters in this case, we have to do it. Otherwise, I don't care one way or the other.
So here is the question, how do we know whether it matters or not? If we come up with an obvious example of ODR violation, and the compiler will not complain about it, can we be sure that it will not complain in other contexts? Or, can we be sure that other context will not cause some kind of "ICE"? I guess my point is that I would like to prove that ODR is not important in this case, but I don't have enough knowledge of compilers to do it with reasonable accuracy :( Where are those good old days when it was claimed that comilation of one unit is totaly independent from the other? Maybe we just all should totally switch to anonimous namespaces? ;-) Arkadiy

"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
I guess my point is that I would like to prove that ODR is not important in this case, but I don't have enough knowledge of compilers to do it with reasonable accuracy :(
Here's my data point: so far the ODR problem with Boost.Bind placeholders has never caused any complaints. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

"David Abrahams" <dave@boost-consulting.com> wrote
"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
I guess my point is that I would like to prove that ODR is not important in this case, but I don't have enough knowledge of compilers to do it with reasonable accuracy :(
Here's my data point: so far the ODR problem with Boost.Bind placeholders has never caused any complaints.
If we decided to go with automatic registration, would you suggest putting the registration templates inside the anonimous namespace? By the way, Daniel James suggested an idea which I didn't like first, but, come to think about it, might be not that bad after all: We could create a compile-time GUID template, and register against its instantiations. Some of the outputs of the Guidgen are quite suitable for instantiation of such a template. This class could be passed from the function using sizeof() in four steps... Probably an overkill, though. And would probably introduce noticable compile-time performance penalty... Besides, is GUID a portable concept? Arkadiy

"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
"David Abrahams" <dave@boost-consulting.com> wrote
"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
I guess my point is that I would like to prove that ODR is not important in this case, but I don't have enough knowledge of compilers to do it with reasonable accuracy :(
Here's my data point: so far the ODR problem with Boost.Bind placeholders has never caused any complaints.
If we decided to go with automatic registration, would you suggest putting the registration templates inside the anonimous namespace?
Yes.
By the way, Daniel James suggested an idea which I didn't like first, but, come to think about it, might be not that bad after all:
We could create a compile-time GUID template, and register against its instantiations. Some of the outputs of the Guidgen are quite suitable for instantiation of such a template. This class could be passed from the function using sizeof() in four steps...
Probably an overkill, though. And would probably introduce noticable compile-time performance penalty...
I don't understand it, so no comment.
Besides, is GUID a portable concept?
I Dunno. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Thu, 22 Jul 2004 18:50:19 -0400, David Abrahams <dave@boost-consulting.com> wrote:
"Arkadiy Vertleyb" <vertleyb@hotmail.com> writes:
Besides, is GUID a portable concept?
I Dunno.
Yes it is when you call it a UUID. Here is an example of a somewhat portable one: http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/UUID_8h-source.html Certainly clumsier than a size_t. Regards, Matt Hurd.

On Fri, 23 Jul 2004 10:49:53 +1000 Matt Hurd <matt.hurd@gmail.com> wrote:
Yes it is when you call it a UUID.
Here is an example of a somewhat portable one: http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/UUID_8h-source.html
Certainly clumsier than a size_t.
There are many portable versions of creating a UUID, but I would imagine that one would have to be available at compile time to make use of it in templates. I think M$ compilers provide this as a preprocessor extension, but I am not sure about any others.

"Jody Hagins" <jody-boost-011304@atdesk.com> wrote
Matt Hurd <matt.hurd@gmail.com> wrote:
Yes it is when you call it a UUID.
Here is an example of a somewhat portable one:
http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/UUID_8h-source.html
Certainly clumsier than a size_t.
There are many portable versions of creating a UUID, but I would imagine that one would have to be available at compile time to make use of it in templates. I think M$ compilers provide this as a preprocessor extension, but I am not sure about any others.
What I rally meant was to piggiback on the tool. Recall guidgen, that MS provides, can generate something like this: // {DCC34F20-AEE3-4e2f-B2CE-A6FA9ED6AE05} DEFINE_GUID(<<name>>, 0xdcc34f20, 0xaee3, 0x4e2f, 0xb2, 0xce, 0xa6, 0xfa, 0x9e, 0xd6, 0xae, 0x5); Every MS programmer is used to doing this to create COM objects. The compile-time version can be easily implemented, something like: template<long, short, short, char, char, char, char, char, char, char, char> struct uuid; The user then would instantiate it just copy-pasting the stuff generated by the tool: uuid<0xdcc34f20, 0xaee3, 0x4e2f, 0xb2, 0xce, 0xa6, 0xfa, 0x9e, 0xd6, 0xae, 0x5>, and here we have a unique type. Now, this is not a brainer to create a unique type, and everybody is used to doing this, but the advantage of this one is in that it can be split into four integers, and later re-created. Therefore it can be passed via sizeof, and therefore it can be used in the typeof implementation. Having said all this, I don't actually believe it's a good idea to make poor compilers operate on four integers where one is enough. The compilers are already stressed beyond any reasonable measure by all our meta-programming excersises. And I believe this particular place is kind of bottleneck in my implementation... Arkadiy

"Daniel James" <daniel@calamity.org.uk> wrote
GUID would be an option, but it's far too big to fit into an integer
This seems a little too obvious, but couldn't you just use more than one integer?
Yes we could use 4 integers instead of 1, but this would: 1) decrease the compile-time performance (which is not that great already), and 2) increase the requirements for the size of the mpl::vector... I don't think we can afford this... Arkadiy

On Thu, 22 Jul 2004 13:27:42 -0400, Arkadiy Vertleyb wrote:
Yes we could use 4 integers instead of 1, but this would:
1) decrease the compile-time performance (which is not that great already), and 2) increase the requirements for the size of the mpl::vector...
I don't think we can afford this...
I think you can get around the second point, but you're right, it's probably not worth it. Daniel
participants (5)
-
Arkadiy Vertleyb
-
Daniel James
-
David Abrahams
-
Jody Hagins
-
Matt Hurd