Dear all, we use boost a lot in our code, but unfortunately this slows down the build times enormously. Although we use the precompiled header in Visual Studio, tracing the includes (with the /showincludes option), it seems that even relatively basic libraries (e.g. tuples, boost\utility.hpp, boost function and boost bind) end up in a massive include of mpl and type traits header files. So I begin to wonder if this wonderful template and preprocessor stuff is really such a good idea for those basic library. Besides that it doesn't improve the readability of the source, my guess is that it has a negative effect on short build times. Can a boost guru comment on this? Note that this is not a criticism, only an observation. I always like to write a boost bind construction, especailly with the == addition in 1.33. Wkr, me
Have you tried distributing your build?
On 11/4/05, gast128
Dear all,
we use boost a lot in our code, but unfortunately this slows down the build times enormously. Although we use the precompiled header in Visual Studio, tracing the includes (with the /showincludes option), it seems that even relatively basic libraries (e.g. tuples, boost\utility.hpp, boost function and boost bind) end up in a massive include of mpl and type traits header files.
So I begin to wonder if this wonderful template and preprocessor stuff is really such a good idea for those basic library. Besides that it doesn't improve the readability of the source, my guess is that it has a negative effect on short build times.
Can a boost guru comment on this? Note that this is not a criticism, only an observation. I always like to write a boost bind construction, especailly with the == addition in 1.33.
Wkr, me
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
gast128 wrote:
Can a boost guru comment on this? Note that this is not a criticism, only an observation. I always like to write a boost bind construction, especailly with the == addition in 1.33.
Boost.Bind only includes a reasonably small bit of MPL indirectly via boost/ref.hpp. This can be avoided in principle, but it won't result in any measurable gains. boost/utility.hpp doesn't include much, but this header isn't of much use anyway. :-) boost/function.hpp does seem to include the world (3 seconds for a single include on my machine), but I'm not sure whether this can be avoided. It's a fairly complex component.
Two questions: 1. Does the latest version of vector.hpp in the CVS fix the problem of serialized vectors not being read back in from archives? 2. If so, then does this file includes boost/archive/has_fast_array_serialization.hpp, but this doesn't seem to be in the CVS (at least, not in the archive directory). Where can I get hold of this file? Thanks, Paul
Paul Giaccone wrote:
Two questions:
1. Does the latest version of vector.hpp in the CVS fix the problem of serialized vectors not being read back in from archives?
2. If so, then does this file includes boost/archive/has_fast_array_serialization.hpp, but this doesn't seem to be in the CVS (at least, not in the archive directory). Where can I get hold of this file?
Oops, I grabbed the file from HEAD instead of 1_33_0. I've got the right file now and am testing it out.
I am serialising data structures that include objects of the form:
std::vector
Paul Giaccone wrote:
For booleans, though, a value of other than 0 or 1 means it has not been initialised, and perhaps this should throw an exception on writing to the archive rather than on reading from it.
Hmmm, I'm not sure about this. Do we know for a fact that a bool variable will always contain 1 or 0? I've never seen code trap on an un-initialized bool. It seems that even an uninitialized bool corresponds to true or false. Perhaps part of the problem is that I used 0 and 1 for bool variable in order to not included english strings "true" and "false" in text files and to save space. I'll think about this. Robert Ramey
If handling uninitialised variables is not practical, then perhaps there could be a warning in the documentation that uninitialised booleans will cause stream errors on deserialisation.
Paul
"Robert Ramey"
Paul Giaccone wrote:
For booleans, though, a value of other than 0 or 1 means it has not been initialised, and perhaps this should throw an exception on writing to the archive rather than on reading from it.
Hmmm, I'm not sure about this. Do we know for a fact that a bool variable will always contain 1 or 0?
Technically, no. We know that an initialized bool will always contain either true or false. 3.9.1 6 Values of type bool are either true or false. true can be converted to 1 and false can be converted to 0. See 4.5 and 4.7.
I've never seen code trap on an un-initialized bool. It seems that even an uninitialized bool corresponds to true or false.
No, an uninitialized bool can't be read without inducing undefined behavior, same as any other uninitialized data. It has no value that can be legally detected. -- Dave Abrahams Boost Consulting www.boost-consulting.com
Robert Ramey wrote:
Paul Giaccone wrote:
For booleans, though, a value of other than 0 or 1 means it has not been initialised, and perhaps this should throw an exception on writing to the archive rather than on reading from it.
Hmmm, I'm not sure about this. Do we know for a fact that a bool variable will always contain 1 or 0? I've never seen code trap on an un-initialized bool. It seems that even an uninitialized bool corresponds to true or false.
No, bools won't always contain 1 or 0, like other types their value is be undefined if they have not been initialised, depending upon where they have been declared. <caveat> C++ isn't my day job...I just use it for fun things... </caveat> When bools are used in logical operations they are converted to integers, so depending on what your bool happens to contain before initialisation it could evaluate to either true or false.
Perhaps part of the problem is that I used 0 and 1 for bool variable in order to not included english strings "true" and "false" in text files and to save space.
I'll think about this.
My two proposals would be the one that encourages pragmatism and the other that encourages correctness. Pragmatic: Well, how about simply treating anything other than 1 as false? I realise this means that you are implicitly initialising someone elses variable should they serialise and then deserialise but it would seem to preserve the effect that you would witness should you use such a variable without performing that set of operations anyway so it would be an "invisible" side-effect. Correct: Initialise all your variables. Shoot all programmers who don't! And of course, the one true way - tell everyone to initialise their variables or bad things might happen and then be lenient on parsing anyway. Regards, n
Nigel Rantor wrote:
Robert Ramey wrote: Pragmatic:
Well, how about simply treating anything other than 1 as false? I realise this means that you are implicitly initialising someone elses variable should they serialise and then deserialise but it would seem to preserve the effect that you would witness should you use such a variable without performing that set of operations anyway so it would be an "invisible" side-effect.
Correct:
Initialise all your variables. Shoot all programmers who don't!
And of course, the one true way - tell everyone to initialise their variables or bad things might happen and then be lenient on parsing anyway.
Actually, my preferred way would be to trap the usage of an unitialized bool variable when it is saved. Its not clear that I can do this. But a close substiture might be to convert the variable to an integer, throw an exception if its not equal to 0 or 1 and serialize it otherwise. Robert Ramey
Robert Ramey wrote:
Nigel Rantor wrote:
Robert Ramey wrote: Pragmatic:
Well, how about simply treating anything other than 1 as false? I realise this means that you are implicitly initialising someone elses variable should they serialise and then deserialise but it would seem to preserve the effect that you would witness should you use such a variable without performing that set of operations anyway so it would be an "invisible" side-effect.
Correct:
Initialise all your variables. Shoot all programmers who don't!
And of course, the one true way - tell everyone to initialise their variables or bad things might happen and then be lenient on parsing anyway.
Actually, my preferred way would be to trap the usage of an unitialized bool variable when it is saved. Its not clear that I can do this. But a close substiture might be to convert the variable to an integer, throw an exception if its not equal to 0 or 1 and serialize it otherwise.
Sorry, I didn't read the later postings before responding to Robert's reply. This is probably the ideal behaviour as it traps the error as soon as it happens. Are there any other types that have restrictions on their values? I can't think of any, apart from float and double, where not all bit patterns correspond to a floating-point value. How does the serialization library handle NaN and +/-infinity? Oh, and consider me shot - I'm hunting down my uninitialised variables as we speak :-) Paul
"Robert Ramey"
Nigel Rantor wrote:
Robert Ramey wrote: Pragmatic: Well, how about simply treating anything other than 1 as false? I realise this means that you are implicitly initialising someone elses variable should they serialise and then deserialise but it would seem to preserve the effect that you would witness should you use such a variable without performing that set of operations anyway so it would be an "invisible" side-effect. Correct: Initialise all your variables. Shoot all programmers who don't! And of course, the one true way - tell everyone to initialise their variables or bad things might happen and then be lenient on parsing anyway.
Actually, my preferred way would be to trap the usage of an unitialized bool variable when it is saved. Its not clear that I can do this.
No, of course you can't. An unitialized bool could look exactly like an initialized one. And if it's uninitialized and you read it, you could crash the program.
But a close substiture might be to convert the variable to an integer, throw an exception if its not equal to 0 or 1
No!! If you convert a bool to an integer and get something other than 0 or 1 then there's a bug in the program (or in the compiler). An assertion is appropriate.
and serialize it otherwise.
Why all this talk of conversion to integers? I still can't understand it. Why not just serialize the bool? -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams
But a close substiture might be to convert the variable to an integer, throw an exception if its not equal to 0 or 1
No!! If you convert a bool to an integer and get something other than 0 or 1 then there's a bug in the program (or in the compiler). An assertion is appropriate.
But that said, I wouldn't try to assert that any more than I'd try to validate that every pointer passed points at a valid object of the pointee type. -- Dave Abrahams Boost Consulting www.boost-consulting.com
On 11/10/05 11:21 AM, "David Abrahams"
"Robert Ramey"
writes: [SNIP] But a close substiture might be to convert the variable to an integer, throw an exception if its not equal to 0 or 1
No!! If you convert a bool to an integer and get something other than 0 or 1 then there's a bug in the program (or in the compiler). An assertion is appropriate. [TRUNCATE]
The bug would be in the compiler since other values are not allowed (see s4.5p4 and s4.7p4). I don't think you could really "assert" for this because you can't trust "assert" (since you can't trust the compiler)! If the "bool" was uninitialized before the conversion, then you have undefined behavior and the not-0-nor-1 result is the least of your problems. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle Walker
On 11/10/05 11:21 AM, "David Abrahams"
wrote: "Robert Ramey"
writes: [SNIP] But a close substiture might be to convert the variable to an integer, throw an exception if its not equal to 0 or 1
No!! If you convert a bool to an integer and get something other than 0 or 1 then there's a bug in the program (or in the compiler). An assertion is appropriate. [TRUNCATE]
The bug would be in the compiler since other values are not allowed (see s4.5p4 and s4.7p4).
No, an bool that was left uninitialized could be converted into a different int value as one expression of undefined behavior. -- Dave Abrahams Boost Consulting www.boost-consulting.com
On 11/9/05 11:15 PM, "Robert Ramey"
Nigel Rantor wrote:
Pragmatic:
Well, how about simply treating anything other than 1 as false? I realise this means that you are implicitly initialising someone elses variable should they serialise and then deserialise but it would seem to preserve the effect that you would witness should you use such a variable without performing that set of operations anyway so it would be an "invisible" side-effect.
Correct:
Initialise all your variables. Shoot all programmers who don't!
And of course, the one true way - tell everyone to initialise their variables or bad things might happen and then be lenient on parsing anyway.
Actually, my preferred way would be to trap the usage of an unitialized bool variable when it is saved. Its not clear that I can do this. But a close substitute might be to convert the variable to an integer, throw an exception if its not equal to 0 or 1 and serialize it otherwise.
The "Correct" method given is the only way. There's no way to detect uninitialized variables. (Anyone who took the effort to offer you a flag could have initialized the object instead.) You can't convert it to an integer, since that also requires reading an uninitialized variable. Games with "unsigned char [sizeof(bool)]" won't really work since the bit pattern(s) for each Boolean state is unspecified. As I understand it, the serialization layout of a type's components is up to the author. That means you can be selective: struct my_brokenness { bool is_next_one_valid; // if FALSE, don't set... bool the_real_value; // ...this member. int other; }; archive & operator & ( archive &a, my_brokenness &b ); { // the WRONG way a & b.is_next_one_valid & b.the_real_value & other; // the RIGHT way a & b.is_next_one_valid; if ( b.is_next_one_valid ) a & b.the_real_value; a & other; return a; } Something like this is required if a member is a "union," right? -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Nigel Rantor
Robert Ramey wrote:
Paul Giaccone wrote:
For booleans, though, a value of other than 0 or 1 means it has not been initialised, and perhaps this should throw an exception on writing to the archive rather than on reading from it.
Hmmm, I'm not sure about this. Do we know for a fact that a bool variable will always contain 1 or 0? I've never seen code trap on an un-initialized bool. It seems that even an uninitialized bool corresponds to true or false.
No, bools won't always contain 1 or 0, like other types their value is be undefined if they have not been initialised, depending upon where they have been declared.
An uninitialized bool doesn't have a value you can read without causing undefined behavior. So for all intents and purposes it doesn't have a value.
<caveat> C++ isn't my day job...I just use it for fun things... </caveat>
When bools are used in logical operations they are converted to integers
Can you cite the standard on that one? I'm pretty sure that it's the other way around: when integers are used in logical operations they are converted to bools.
, so depending on what your bool happens to contain before initialisation it could evaluate to either true or false.
Or it could crash your computer.
Perhaps part of the problem is that I used 0 and 1 for bool variable in order to not included english strings "true" and "false" in text files and to save space.
I'll think about this.
My two proposals would be the one that encourages pragmatism and the other that encourages correctness.
Pragmatic:
Well, how about simply treating anything other than 1 as false?
I don't know what 1 has to do with anything. The values of a bool are true and false. 1 is an int. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
Nigel Rantor
writes:
Hi David, Well, I don't have access to the raw spec, I am working from "The C++ Language 3rd Ed." so if I make some errors due to this you'll have to forgive me.
Robert Ramey wrote:
Paul Giaccone wrote:
For booleans, though, a value of other than 0 or 1 means it has not been initialised, and perhaps this should throw an exception on writing to the archive rather than on reading from it.
Hmmm, I'm not sure about this. Do we know for a fact that a bool variable will always contain 1 or 0? I've never seen code trap on an un-initialized bool. It seems that even an uninitialized bool corresponds to true or false.
No, bools won't always contain 1 or 0, like other types their value is be undefined if they have not been initialised, depending upon where they have been declared.
An uninitialized bool doesn't have a value you can read without causing undefined behavior. So for all intents and purposes it doesn't have a value.
My understanding of uninitialised variables is that their *values* were undefined, that you could not rely on them to be any particular value, including not being within range for that type. So, you can read them, but there are no guarantees about what you'll get back. I suppose I agree, it really isn't a bool yet unless you can be assured that it contains true or false, but it does have a value.
<caveat> C++ isn't my day job...I just use it for fun things... </caveat>
When bools are used in logical operations they are converted to integers
Can you cite the standard on that one? I'm pretty sure that it's the other way around: when integers are used in logical operations they are converted to bools.
No, but I read it in the fourth para of sec 4.2 of TCPPPL. "In arithmetic and logical expressions, bools are converted to ints; integer arithmetic and logical operations are performed on the converted values."
, so depending on what your bool happens to contain before initialisation it could evaluate to either true or false.
Or it could crash your computer.
Computer or program? Really? Please elaborate, I'm interested. I don't see how accessing a piece of memory that hasn't been initialised to a legal value would cause that.
Perhaps part of the problem is that I used 0 and 1 for bool variable in order to not included english strings "true" and "false" in text files and to save space.
I'll think about this.
My two proposals would be the one that encourages pragmatism and the other that encourages correctness.
Pragmatic:
Well, how about simply treating anything other than 1 as false?
I don't know what 1 has to do with anything. The values of a bool are true and false. 1 is an int.
I was simply using the OP's terminology, I apologise. s/1/true/ in my above sentence. :-) Anyway, the two approaches that I described for this problem still stand. Tell people to do it right or be lenient when parsing. If Robert can find a nice way of trapping uninitialised variables (not simply bools of course) and throw exceptions for these then that's great. I'm not sure that is entirely possible though. (and I haven't psent any time thinking about it either) n
Nigel Rantor wrote:
My understanding of uninitialised variables is that their *values* were undefined, that you could not rely on them to be any particular value, including not being within range for that type.
No, this is only true for "unsigned char". Accessing the value of an uninitialized object of any other type is undefined behavior, which means that you can - and in some cases will - get a hardware trap.
So, you can read them, ...
Not in portable code, although true in practice for most of today's hardware.
Peter Dimov wrote:
Nigel Rantor wrote:
My understanding of uninitialised variables is that their *values* were undefined, that you could not rely on them to be any particular value, including not being within range for that type.
No, this is only true for "unsigned char". Accessing the value of an uninitialized object of any other type is undefined behavior, which means that you can - and in some cases will - get a hardware trap.
Okay, cool. Kind of off-topic. Is there any particular reference you guys would recommend for the C++ standard instead of me being led astray by Bjarne's TCPPPL? Regards, n
Nigel Rantor wrote:
If Robert can find a nice way of trapping uninitialised variables (not simply bools of course) and throw exceptions for these then that's great. I'm not sure that is entirely possible though. (and I haven't psent any time thinking about it either)
I checked my reference on the subject of bool/int conversion. This is Section 4.2 Booleans in "The C++ Programming Language by Stroustrup. I see that "in arithmetic or logical expressions, bools are converted to ints... if the result is converted back to bool, a 0 is converted to false and a non-zero value is converted to true". Looking at the above and having considered the postings on the thread my inclination is to do the following: * when booleans are output, booleans are converted to integers and an assertion is thrown if the resulting value is other than zero or one. This is in line with my view of trapping a program as soon as possible when any kind of programmign error has been detected. This naturally suggests the following for floats and doubles. * any attempt to save floats or doubles for which the result of isnan(..) is false will trap with an assertion. This would be a good thing from my standpoint as up until now a NaN could be saved but not recovered as the standard text stream input chokes on the Nan Text. Meanwhile the binary input just loads the bits whatever they were. This conclicts with my goal of making all archives behave alike. I'm concerned that someone standup and show "But NaN is a valid value for a float or double and I should be able to serialize that!!!". I'm inclined to reject this characterisation basically because doing so will make my life easier. This will trap some user's errors at the cost of prohibiting some behavior that could be defended as correct. Now I have one more damn problem. Some compilers (vc, bcb 5.51) use _isnan(...) while others (gcc) use isnan(...). Including <cmath> doesn't help here because it doesn't include isnan. Of course I can handle this with the #if/endif blunt instrument but if anyone has a better idea I would like to hear it. Robert Ramey
"Robert Ramey"
Nigel Rantor wrote:
If Robert can find a nice way of trapping uninitialised variables (not simply bools of course) and throw exceptions for these then that's great. I'm not sure that is entirely possible though. (and I haven't psent any time thinking about it either)
I checked my reference on the subject of bool/int conversion. This is Section 4.2 Booleans in "The C++ Programming Language by Stroustrup.
I see that "in arithmetic or logical expressions, bools are converted to ints... if the result is converted back to bool, a 0 is converted to false and a non-zero value is converted to true".
Looking at the above
Do not look at the above. It contradicts the standard, as shown: 5.14 Logical AND operator 5 Expressions 1 The && operator groups left-to-right. The operands are both implicitly converted to type bool (clause 4)... You don't think compiler implementors refer to TC++PL when deciding how to write their compilers, do you?
and having considered the postings on the thread my inclination is to do the following:
* when booleans are output, booleans are converted to integers and an assertion is thrown if the resulting value is other than zero or one.
What do you mean, "an assertion is thrown?" Only exceptions can be thrown.
This is in line with my view of trapping a program as soon as possible when any kind of programmign error has been detected.
Some kinds of programming errors aren't worth trying to detect. Passing uninitialized data is fundamental brokenness, and looking for it with only in bools is rather silly.
This naturally suggests the following for floats and doubles.
* any attempt to save floats or doubles for which the result of isnan(..) is false will trap with an assertion.
So, you're going to assert whenever someone tries to save a non-NaN? That sounds pretty useless. Even if you meant the opposite, it would be an unfortunate choice. Non-signalling NaNs should be serializable.
This would be a good thing from my standpoint as up until now a NaN could be saved but not recovered as the standard text stream input chokes on the Nan Text.
Seems like you should work around that problem.
Meanwhile the binary input just loads the bits whatever they were. This conclicts with my goal of making all archives behave alike.
Good, then I'll (foolishly) assume you'll make NaNs serializable everywhere...
I'm concerned that someone standup and show "But NaN is a valid value for a float or double and I should be able to serialize that!!!".
Damn straight.
I'm inclined to reject this characterisation basically because doing so will make my life easier. This will trap some user's errors at the cost of prohibiting some behavior that could be defended as correct.
I find it very distressing that you keep dismissing the needs of the numerics community. This is likely to lead to something very distasteful, like a semi-official branch of the serialization library that works the way we need it to. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
* when booleans are output, booleans are converted to integers and an assertion is thrown if the resulting value is other than zero or one.
What do you mean, "an assertion is thrown?" Only exceptions can be thrown.
I mean the assert macro is passed a value of false
This is in line with my view of trapping a program as soon as possible when any kind of programmign error has been detected.
Some kinds of programming errors aren't worth trying to detect. Passing uninitialized data is fundamental brokenness, and looking for it with only in bools is rather silly.
Hmm - here I have a situation where I can trap an error commited when an archive is saved that would not otherwise be detected until the archive is loaded. The fact I can't trap it for all data types doesn't mean one should pass the opportunity up for bools. Since its an assert, it will have no detrimental effect on runtime of release build code. What is the downside of doing this?
This naturally suggests the following for floats and doubles.
* any attempt to save floats or doubles for which the result of isnan(..) is false will trap with an assertion.
So, you're going to assert whenever someone tries to save a non-NaN? That sounds pretty useless. Even if you meant the opposite,
LOL - of course I meant the opposite.
it would be an unfortunate choice. Non-signalling NaNs should be serializable.
Hmm - that sounds like a matter of opinion to me.
This would be a good thing from my standpoint as up until now a NaN could be saved but not recovered as the standard text stream input chokes on the Nan Text.
Seems like you should work around that problem.
If it indeed it is a problem.
Meanwhile the binary input just loads the bits whatever they were. This conclicts with my goal of making all archives behave alike.
...
I'm concerned that someone standup and show "But NaN is a valid value for a float or double and I should be able to serialize that!!!".
Damn straight.
I'm inclined to reject this characterisation basically because doing so will make my life easier. This will trap some user's errors at the cost of prohibiting some behavior that could be defended as correct.
What I'm curious about is what is the utility of serializing a NaN? Why would someone want to do this? What does it usually mean? The only thing that occurs to me is that it would be uninitialized data. If NaN has been overloaded with some sort of meaning like "undetermined value" or something like that I would think its a questionable and error prone practice. If that's the case, I don't see its a bad thing if the libray fails to support it. So far only one user has raised the issue of having serialized a NaN and having it trap on reading back the archive. I don't read a whole lot into this as this would only occur in text and xml archives and perhaps others who do this are using binary archives. But it does suggest that this isn't a huge issue for people actually using the library.
I find it very distressing that you keep dismissing the needs of the numerics community.
I have not done this. And I resent the accusation that I have.
This is likely to lead to something very distasteful, like a semi-official branch of the serialization library that works the way we need it to.
Hmm - I can't imagine why anyone would want to do that. Someone might want to make thier own derivation(s) of one or more archive classes or even a whole new archive class - but that doesn't represent any conflict with the current library. Robert Ramey
Robert Ramey wrote:
David Abrahams wrote:
* when booleans are output, booleans are converted to integers and an assertion is thrown if the resulting value is other than zero or one.
What do you mean, "an assertion is thrown?" Only exceptions can be thrown.
I mean the assert macro is passed a value of false
This is in line with my view of trapping a program as soon as possible when any kind of programmign error has been detected.
Some kinds of programming errors aren't worth trying to detect. Passing uninitialized data is fundamental brokenness, and looking for it with only in bools is rather silly.
Hmm - here I have a situation where I can trap an error commited when an archive is saved that would not otherwise be detected until the archive is loaded. The fact I can't trap it for all data types doesn't mean one should pass the opportunity up for bools. Since its an assert, it will have no detrimental effect on runtime of release build code. What is the downside of doing this?
I don't really have an opinion here, but why is it the job of the serialization library to attempt to detect completely unrelated programmer errors?
This naturally suggests the following for floats and doubles.
* any attempt to save floats or doubles for which the result of isnan(..) is false will trap with an assertion.
So, you're going to assert whenever someone tries to save a non-NaN? That sounds pretty useless. Even if you meant the opposite,
LOL - of course I meant the opposite.
it would be an unfortunate choice. Non-signalling NaNs should be serializable.
Hmm - that sounds like a matter of opinion to me.
This would be a good thing from my standpoint as up until now a NaN could be saved but not recovered as the standard text stream input chokes on the Nan Text.
Seems like you should work around that problem.
If it indeed it is a problem.
Meanwhile the binary input just loads the bits whatever they were. This conclicts with my goal of making all archives behave alike.
...
I'm concerned that someone standup and show "But NaN is a valid value for a float or double and I should be able to serialize that!!!".
Damn straight.
I'm inclined to reject this characterisation basically because doing so will make my life easier. This will trap some user's errors at the cost of prohibiting some behavior that could be defended as correct.
What I'm curious about is what is the utility of serializing a NaN? Why would someone want to do this? What does it usually mean? The only thing that occurs to me is that it would be uninitialized data. If NaN has been overloaded with some sort of meaning like "undetermined value" or something like that I would think its a questionable and error prone practice. If that's the case, I don't see its a bad thing if the libray fails to support it.
NaN, +/- inf, +/- zero should all be serializable. NaN arises in arithmetic operations in numerous places, 0/0, inf/inf, sqrt(-1.0), and so on. Inf is perhaps more common, and more useful (some operations on inf return normal numbers, eg exp(-inf) == 0.0). This allow you to avoid inserting checks everywhere for invalid values, instead you just let the NaN/inf propogate through the calculation. Although I think it is quite rare to design floating point code to handle NaN, a few algorithms make use of inf (some of LAPACK, for example).
So far only one user has raised the issue of having serialized a NaN and having it trap on reading back the archive. I don't read a whole lot into this as this would only occur in text and xml archives and perhaps others who do this are using binary archives. But it does suggest that this isn't a huge issue for people actually using the library.
That isn't true; see below
I find it very distressing that you keep dismissing the needs of the numerics community.
I have not done this. And I resent the accusation that I have.
The issue with NaN/inf has come up before: http://lists.boost.org/Archives/boost/2004/07/67259.php http://lists.boost.org/boost-users/2005/01/9359.php Well over a year later, and no progress? Not to mention fast array serialization, for which Matthias Troyer has, to my mind, presented convincing arguments that it should be part of the core functionality.
This is likely to lead to something very distasteful, like a semi-official branch of the serialization library that works the way we need it to.
I have heard mumblings to that effect.
Hmm - I can't imagine why anyone would want to do that.
and THAT, I think, is the crux of the problem.
Someone might want to make thier own derivation(s) of one or more archive classes or even a whole new archive class - but that doesn't represent any conflict with the current library.
Regards, Ian McCulloch
Ian McCulloch wrote:
Robert Ramey wrote:
Hmm - here I have a situation where I can trap an error commited when an archive is saved that would not otherwise be detected until the archive is loaded. The fact I can't trap it for all data types doesn't mean one should pass the opportunity up for bools. Since its an assert, it will have no detrimental effect on runtime of release build code. What is the downside of doing this?
I don't really have an opinion here, but why is it the job of the serialization library to attempt to detect completely unrelated programmer errors?
Every time someone has a problem of this nature, they ask me. So naturally I have a lot of incentive to see that these things are trapped before they get to me. So for me personally, there is an upside. The question remains - What is the downside of doing this?
What I'm curious about is what is the utility of serializing a NaN? Why would someone want to do this? What does it usually mean? The only thing that occurs to me is that it would be uninitialized data. If NaN has been overloaded with some sort of meaning like "undetermined value" or something like that I would think its a questionable and error prone practice. If that's the case, I don't see its a bad thing if the libray fails to support it.
NaN, +/- inf, +/- zero should all be serializable. NaN arises in arithmetic operations in numerous places, 0/0, inf/inf, sqrt(-1.0), and so on. Inf is perhaps more common, and more useful (some operations on inf return normal numbers, eg exp(-inf) == 0.0). This allow you to avoid inserting checks everywhere for invalid values, instead you just let the NaN/inf propogate through the calculation. Although I think it is quite rare to design floating point code to handle NaN, a few algorithms make use of inf (some of LAPACK, for example).
Your reference below - http://lists.boost.org/boost-users/2005/01/9359.php - touches upon this subject in an interesting way. "This [NaN problem] seems to be a result of relying on the, AFAIK, undefined behavior of writing NaN/infinity to a stream; it will work correctly with some standard library implementations, but not all." I don't know what the standard says about writing/reading a NaN or +/-inf to a stream and reading it back. Also what do most i/o stream implementations do? Certainly, the users that have had the problem aren't using stream libraries which support this behavior. Then the question arises, if it isn't important enough for the common stream library implementations to support it, why should we invest effort in it. I would guess that the the reason the problem has come up so infrequently is that users don't need to save/load these types of values very often or its easy to work around. Otheriwise the standard stream library would have addressed this stuff long ago - and of course we wouldn't be discussing the issue now.
So far only one user has raised the issue of having serialized a NaN and having it trap on reading back the archive. I don't read a whole lot into this as this would only occur in text and xml archives and perhaps others who do this are using binary archives. But it does suggest that this isn't a huge issue for people actually using the library.
That isn't true; see below
Sorry two users - one in thread july 2004 and the most recent one Jan 2005
I find it very distressing that you keep dismissing the needs of the numerics community.
I have not done this. And I resent the accusation that I have.
The issue with NaN/inf has come up before: http://lists.boost.org/Archives/boost/2004/07/67259.php http://lists.boost.org/boost-users/2005/01/9359.php
Well over a year later, and no progress?
LOL - what have YOU accomplished in the last year?
Not to mention fast array serialization, for which Matthias Troyer has, to my mind, presented convincing arguments that it should be part of the core functionality.
He's wrong about this. There is no advantage to including it in the place he wants to. Everything he want's to achieve can be obtained without mucking about in the library itself which is already large and complex enough. The final result would be less tightly coupled, more reliable, optional for users who don't need it, more easily testable and demonstrably correct.
This is likely to lead to something very distasteful, like a semi-official branch of the serialization library that works the way we need it to.
I have heard mumblings to that effect.
Hmm - I can't imagine why anyone would want to do that.
and THAT, I think, is the crux of the problem.
OK, let me rephrase that. There would be no advantage to doing that since one could ->
make thier own derivation(s) of one or more archive classes or even a whole new archive class - but that doesn't represent any conflict with the current library.
Regards, Ian McCulloch
"Robert Ramey"
Ian McCulloch wrote:
Robert Ramey wrote:
Your reference below - http://lists.boost.org/boost-users/2005/01/9359.php - touches upon this subject in an interesting way. "This [NaN problem] seems to be a result of relying on the, AFAIK, undefined behavior of writing NaN/infinity to a stream; it will work correctly with some standard library implementations, but not all." I don't know what the standard says about writing/reading a NaN or +/-inf to a stream and reading it back.
I recommend getting a copy of the C and C++ standards. It's cheap, easy, and well worth the investment. When using the default standard "C" locale, floating point numbers are converted according to corresponding stdio functions in the C standard. The C standard describes how NaNs are converted in 17.9.6.1 and 17.9.6.2. It specifies that NaNs are read back in the same way they're written out.
Also what do most i/o stream implementations do? Certainly, the users that have had the problem aren't using stream libraries which support this behavior. Then the question arises, if it isn't important enough for the common stream library implementations to support it, why should we invest effort in it.
Because it's important for your users.
I would guess that the the reason the problem has come up so infrequently is that users don't need to save/load these types of values very often or its easy to work around. Otheriwise the standard stream library would have addressed this stuff long ago - and of course we wouldn't be discussing the issue now.
We're discussing it now because you announced a plan to assert when passed a NaN.
Not to mention fast array serialization, for which Matthias Troyer has, to my mind, presented convincing arguments that it should be part of the core functionality.
He's wrong about this. There is no advantage to including it in the place he wants to.
You keep on asserting that, but when he explained the advantage, you didn't reply.
Everything he want's to achieve can be obtained without mucking about in the library itself which is already large and complex enough.
Matthias isn't wrong. For what it's worth, if I were in your shoes I'd be thinking *very* carefully before making that assertion so baldly, because the chances of being right are so slim. My experience of Matthias is that he's smarter than the rest of us put together.
The final result would be less tightly coupled, more reliable, optional for users who don't need it, more easily testable and demonstrably correct.
This is likely to lead to something very distasteful, like a semi-official branch of the serialization library that works the way we need it to.
I have heard mumblings to that effect.
Hmm - I can't imagine why anyone would want to do that.
and THAT, I think, is the crux of the problem.
OK, let me rephrase that. There would be no advantage to doing that since one could ->
make thier own derivation(s) of one or more archive classes or even a whole new archive class - but that doesn't represent any conflict with the current library.
A special archive class doesn't help if the interface for serializing arrays isn't a standard part of the archive concept. Everyone who writes a datatype containing an array will need to use that interface in his serialize function. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
"Robert Ramey"
writes: OK, let me rephrase that. There would be no advantage to doing that since one could ->
make thier own derivation(s) of one or more archive classes or even a whole new archive class - but that doesn't represent any conflict with the current library.
A special archive class doesn't help if the interface for serializing arrays isn't a standard part of the archive concept. Everyone who writes a datatype containing an array will need to use that interface in his serialize function.
The archive class concept indicates that the following expression is true ar << t for any serializable type t. The documentation defines a serializable type as: A type T is Serializable if and only if one of the following is true: a.. it is a primitive type. In this document, we use the term primitive type to mean types whose data is simply saved/loaded to/from an archive with no further processing. Arithmetic (including characters), bool, enum and stl::string and stl::wstring types are primitive types. Using serialization traits, any user type can also be designated as "primitive" so that it is handled in this way. b.. It is a class type and for all Archive classes, one of the following has been declared: a.. a class member function serialize b.. a global function serialize c.. it is a pointer to a Serializable type. d.. it is a reference to a Serializable type. e.. it is an native C++ Array of Serializable type. That is any array is serializable if its elements are serializable. Doesn't that cover it? Robert Ramey
On 11/11/05 4:40 PM, "Robert Ramey"
David Abrahams wrote:
"Robert Ramey"
writes: OK, let me rephrase that. There would be no advantage to doing that since one could ->
make thier own derivation(s) of one or more archive classes or even a whole new archive class - but that doesn't represent any conflict with the current library.
A special archive class doesn't help if the interface for serializing arrays isn't a standard part of the archive concept. Everyone who writes a datatype containing an array will need to use that interface in his serialize function.
The archive class concept indicates that the following expression is true
ar << t
for any serializable type t.
The documentation defines a serializable type as:
A type T is Serializable if and only if one of the following is true: a.. it is a primitive type. In this document, we use the term primitive type to mean types whose data is simply saved/loaded to/from an archive with no further processing. Arithmetic (including characters), bool, enum and stl::string and stl::wstring types are primitive types. Using serialization traits, any user type can also be designated as "primitive" so that it is handled in this way. b.. It is a class type and for all Archive classes, one of the following has been declared: a.. a class member function serialize b.. a global function serialize c.. it is a pointer to a Serializable type. d.. it is a reference to a Serializable type. e.. it is an native C++ Array of Serializable type. That is any array is serializable if its elements are serializable. Doesn't that cover it?
I take it that member pointers (both data and function) cannot be serialized, right? I guess I would map each qualifying member to a number (or string if open-ended) and serialize that number. I think we need two new archive loading primitives. 1. Array segment. You already have compile-time arrays covered, but I don't see anything about run-time arrays. (Surprising considering how often CT-arrays are "dissed" for RT-arrays.) A run-time array is defined by a pointer to the first element and an element count: // I'm just guessing what your base type looks like class archive_base { //... template < typename T > status_type archive_rtarray( T *a, size_t &c ) { this->archive( c ); for ( size_t i = c ; i ; --i, ++a ) this->archive( *a ); return OK; } //... }; Any class that maintains its own array of elements that doesn't use a smart container, standard or otherwise, would need this for serialization. And what if a class uses a non-standard container? We could use an iterator-based routine: class archive_base { //... template < typename T > status_type archive_container_save( T const &c ) { typedef typename T::const_iterator iterator; iterator const e = c.end(); for ( iterator b = c.begin() ; e != b ; ++b ) this->archive_save( *b ); // You would proably use something kewl // from Boost.bind/lambda here instead. return OK; } template < typename T > status_type archive_container_load( T &c ) { typedef typename T::size_type size_type; size_type s; this->archive_load( s ); typename T::element_type temp; for ( size_type i = 0 ; i < s ; ++i ) { this->archive_load( temp ); c.push_back( temp ); } return OK; } //... } The routines for the standard containers could call these primitives. The one for std::vector could use the RT-array primitive. Finally, a fast archive could be keyed to a type's POD status. (I think we have a type-trait for that.) We would make a subclass of the binary archive class that has overrides for POD types. Those types would dump their bytes directly, even if they're arrays or structures. Obviously we give up portability for speed. My new RT-array primitive should also be optimized for POD types. Non-PODs would call their regular coding function, hopefully getting a speed-up from the individual component speed-ups. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
1. Array segment.
And what if a class uses a non-standard container? We could use an iterator-based routine:
any user defined type can specify its serialization independent of the archive. With a few special exceptions serialization library has almost no "built in knowledge" of specific types. To see how a custom collection would handled you might just check to see how the stl collections have been handled. They aren't special in any way. The array segment would be an interesting one. For ideas on this one might see how name-value pairs are serialized. also binary object might contain some good ideas. This kind of thing is discussed under the theme "serialization wrappers" in the documention.
The routines for the standard containers could call these primitives. The one for std::vector could use the RT-array primitive.
No one has done valarray so far. There are probably other collections out there that might be interesting as well. No one has submitted any so maybe there is no demand. Or maybe - since is pretty easy, people are just making them, using them and moving on. Or maybe for all I known ot that many people are even using the library .
Finally, a fast archive could be keyed to a type's POD status. (I think we have a type-trait for that.) We would make a subclass of the binary archive class that has overrides for POD types. Those types would dump their bytes directly, even if they're arrays or structures. Obviously we give up portability for speed. My new RT-array primitive should also be optimized for POD types. Non-PODs would call their regular coding function, hopefully getting a speed-up from the individual component speed-ups.
This is a great idea !!! See my response to matthias proposal for my take on it. Robert Ramey
Ian McCulloch wrote:
I don't really have an opinion here, but why is it the job of the serialization library to attempt to detect completely unrelated programmer errors?
My concern is that the library can write out something it cannot read back in... this conflicts with its most obvious use, taking an object, writing it to disc, reading it back in, exactly as it was before writing to disc (assuming same machine reading writing, ignoring memory locations for pointers etc). I think in the case reported by Paul, he's not necessarily using the unitialised value, as its an object that is kind of like a discriminated union. I think this usage parallels the idea of NaN's etc in floating point. I'd expect these to be read back in too, as you suggested. However, having a debug flag separate from NDEBUG, that puts in stricter checks at the expense of speed of serialisation is a good idea IMO. This should be orthoganal to the serialisation of a bool/float/whatever. Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |
Kevin Wheatley wrote:
Ian McCulloch wrote:
I don't really have an opinion here, but why is it the job of the serialization library to attempt to detect completely unrelated programmer errors?
My concern is that the library can write out something it cannot read back in... this conflicts with its most obvious use, taking an object, writing it to disc, reading it back in, exactly as it was before writing to disc (assuming same machine reading writing, ignoring memory locations for pointers etc).
I quite agree. Cutting through all the discussion that there has been on this subject, surely the issue here, as has already been said, is that the library should trap the error when *writing* any value that will cause an error when it is read back in, instead of when it is being *read back in*, as is currently happening. Allowing an value to be written that cannot be read back looks like a bug to me. Consideration of whether or not to allow reading and writing of NaN, +/-inf, etc, is a side issue (albeit one worthy of discussion) as I would expect the read and write behaviour to be the same for these values. If it is not, this is something else that needs to be considered. Paul
On 11/11/05 5:12 AM, "Kevin Wheatley"
Ian McCulloch wrote:
I don't really have an opinion here, but why is it the job of the serialization library to attempt to detect completely unrelated programmer errors?
My concern is that the library can write out something it cannot read back in... this conflicts with its most obvious use, taking an object, writing it to disc, reading it back in, exactly as it was before writing to disc (assuming same machine reading writing, ignoring memory locations for pointers etc).
In this case, we would have a bug in the decoding and encoding routines. The bug would be that they don't match. If the coding routines are calling the standard library (like I think they are for text archives of primitive types), then the bug is from the standard library not being symmetric. I think the standard library is supposed to give symmetric text I/O, so how much effort should we do to work around such bugs? Reading from an uninitialized variable, like what could happen in the original case during encoding, is not a problem any library can fix. The programmer just has to be non-sloppy.
I think in the case reported by Paul, he's not necessarily using the unitialised value, as its an object that is kind of like a discriminated union. I think this usage parallels the idea of NaN's etc in floating point. I'd expect these to be read back in too, as you suggested.
The problems are not in parallel. For a discriminated union, it is the responsibility of the coding author to determine which fields are active and only read/write those particular fields and skip the inactive fields. The unusability of NaN values is from a high-level perspective, such values are still valid objects from a low-level view. (And the high-level view is just an opinion; some programmers might want to keep NaNs around as a flag.)
However, having a debug flag separate from NDEBUG, that puts in stricter checks at the expense of speed of serialisation is a good idea IMO. This should be orthoganal to the serialisation of a bool/float/whatever.
-- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle Walker
Reading from an uninitialized variable, like what could happen in the original case during encoding, is not a problem any library can fix. The programmer just has to be non-sloppy.
I buy Robert's argument that, in the case of bool, the assertion will often help him diagnose the problem and help his users when they complain that the library isn't working because they forgot to initialize something. That same argument doesn't work for NaNs, for all kinds of reasons. -- Dave Abrahams Boost Consulting www.boost-consulting.com
On 11/13/05 10:01 AM, "David Abrahams"
Daryle Walker
writes: Reading from an uninitialized variable, like what could happen in the original case during encoding, is not a problem any library can fix. The programmer just has to be non-sloppy.
I buy Robert's argument that, in the case of bool, the assertion will often help him diagnose the problem and help his users when they complain that the library isn't working because they forgot to initialize something. [TRUNCATE]
Since Robert's test involves undefined behavior, it's not portable. The assert may work on his system but may crash & burn on a different environment. The assert didn't help the second user at all. And what happens if the uninitialized variable just happens to match a valid state? -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle Walker wrote:
On 11/13/05 10:01 AM, "David Abrahams"
wrote: Daryle Walker
writes: Reading from an uninitialized variable, like what could happen in the original case during encoding, is not a problem any library can fix. The programmer just has to be non-sloppy.
I buy Robert's argument that, in the case of bool, the assertion will often help him diagnose the problem and help his users when they complain that the library isn't working because they forgot to initialize something.
[TRUNCATE]
Since Robert's test involves undefined behavior, it's not portable. The assert may work on his system but may crash & burn on a different environment. The assert didn't help the second user at all. And what happens if the uninitialized variable just happens to match a valid state?
If users complain that the library isn't working because they forgot to initialize something, and this is included in the documentation as a reason for the exception that is thrown (stream error), as I have suggested, then users might well still complain, but they can then be told to RTFM :) There's no need to pander to slipshod programmers by trying to deal with their broken code - all that is needed is an update to the documentation. Simple solution. Paul
Paul Giaccone
Daryle Walker wrote:
Since Robert's test involves undefined behavior, it's not portable. The assert may work on his system but may crash & burn on a different environment. The assert didn't help the second user at all. And what happens if the uninitialized variable just happens to match a valid state?
If users complain that the library isn't working because they forgot to initialize something, and this is included in the documentation as a reason for the exception that is thrown (stream error)
An exception is inappropriate as a documented response to uninitialized data. Any test that could detect uninitialized data causes undefined behavior anyway, so you can't promise to throw an exception. The program counter could already be in never-never land. -- Dave Abrahams Boost Consulting www.boost-consulting.com
Daryle Walker wrote:
Since Robert's test involves undefined behavior, it's not portable. The assert may work on his system but may crash & burn on a different environment. The assert didn't help the second user at all. And what happens if the uninitialized variable just happens to match a valid state?
nothing ... the archive would still report as being written correctly and would load correctly too. From the point of view of least surprise at least this is better than writing out rubish, not reporting an error and failing to load it back in. The library would be robust under more circumstances. Is there a compiler that does something bad (silently breaks something) available, where this really is a problem? Most people will assume (yes I know) that testing against the false condition (false, 0, whatever) would be all you would see in the code when testing a boolean anyway. (David A. even suggested code for serialising based on this stream << (foo?"t":"f") so I don't feel too bad saying this :-) Maybe under optimisation the alternate behaviour testing against true might mean the compiler decides that the whole assert in this case must be a tautology and eliminate it, but I'd only expect this under a release build which would not have the assert code anyway. me-thinks there are bigger fish to fry here... Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |
Daryle Walker
On 11/13/05 10:01 AM, "David Abrahams"
wrote: Daryle Walker
writes: Reading from an uninitialized variable, like what could happen in the original case during encoding, is not a problem any library can fix. The programmer just has to be non-sloppy.
I buy Robert's argument that, in the case of bool, the assertion will often help him diagnose the problem and help his users when they complain that the library isn't working because they forgot to initialize something. [TRUNCATE]
Since Robert's test involves undefined behavior, it's not portable.
The test is completely portable. The function's requirements are that the data passed is valid. Under those circumstances the test is completely valid. If the user passes an uninitialized bool he has already invoked undefined behavior and all bets are off.
The assert may work on his system but may crash & burn on a different environment.
It can only crash and burn if the user has already invoked undefined behavior by passing an uninitialized bool, and in that case leaving out the assert is no protection against a crash -- the library is about to read the bool in order to write it into the stream, which reading is just as likely to cause the same crash.
The assert didn't help the second user at all.
The second user?
And what happens if the uninitialized variable just happens to match a valid state?
The bug goes undetected for now, and the assert neither helps nor hurts. -- Dave Abrahams Boost Consulting www.boost-consulting.com
Daryle Walker wrote:
In this case, we would have a bug in the decoding and encoding routines. The bug would be that they don't match. If the coding routines are calling the standard library (like I think they are for text archives of primitive types), then the bug is from the standard library not being symmetric. I think the standard library is supposed to give symmetric text I/O
I don't know what the standard library is supposed to do. But the fact is that at least some implementations of he standard library are not handling text i/o symetically in at least two cases: uninitialized bools. floating/double NaN, +/- inf, etc.
so how much effort should we do to work around such bugs?
actually, the effort for uninitialized bool is pretty trivial and I've incorporated and assertion into the appropriate spot. For the others, its a little more work. If someone has enough interest to actually make and test the changes, I'll be happy to receive them, check them, and incorporate them in to the code. I would expect that only some small changes in ??text_i/oprimitive would be necessary. Oh it would be a bad idea to post them to the list. Personally, I'm of the view trying to serialize a NaN would probably a be bug in user code. I'm aware that not everyone would agree with this. Maybe throwing an exception might be enabled/dissabled with another flag applied at archive open time (like no_header, etc). Any way, its not a big issue for me. Presumably, if someone has interest he can submit his improvements and we can discuss it then.
Reading from an uninitialized variable, like what could happen in the original case during encoding, is not a problem any library can fix. The programmer just has to be non-sloppy.
lol - and further more, if we catch him doing something like this we should make sure we don't tell him so he get his deserved punishment!!!!
I think in the case reported by Paul, he's not necessarily using the unitialised value, as its an object that is kind of like a discriminated union. I think this usage parallels the idea of NaN's etc in floating point. I'd expect these to be read back in too, as you suggested.
The problems are not in parallel. For a discriminated union, it is the responsibility of the coding author to determine which fields are active and only read/write those particular fields and skip the inactive fields. The unusability of NaN values is from a high-level perspective, such values are still valid objects from a low-level view. (And the high-level view is just an opinion; some programmers might want to keep NaNs around as a flag.)
This articulates my view. I used the term "overloading" as in semantic overloading where we might use NaN to mean something specific. I can see where this might be useful in some narrow contexts but I would generally consider it an error prone practice. Just one man's opinion. Robert Ramey
On Sun, Nov 13, 2005 at 10:26:49AM -0800, Robert Ramey wrote:
I don't know what the standard library is supposed to do. But the fact is that at least some implementations of he standard library are not handling text i/o symetically in at least two cases:
uninitialized bools. floating/double NaN, +/- inf, etc.
We hit this problem early on. We're serializing things where NaN is overloaded to mean both NaN and "uninitialized", and where +/- inf are perfectly valid values. We often need to serialize structures between runs that contain sections that haven't been initialized. A couple of our platforms have exactly this bug. I'll dig up my changes and send them to you. IIRC they were three or four lines each. -t
troy d. straszheim wrote:
On Sun, Nov 13, 2005 at 10:26:49AM -0800, Robert Ramey wrote:
I don't know what the standard library is supposed to do. But the fact is that at least some implementations of he standard library are not handling text i/o symetically in at least two cases:
uninitialized bools. floating/double NaN, +/- inf, etc.
We hit this problem early on. We're serializing things where NaN is overloaded to mean both NaN and "uninitialized", and where +/- inf are perfectly valid values. We often need to serialize structures between runs that contain sections that haven't been initialized. A couple of our platforms have exactly this bug.
I'll dig up my changes and send them to you. IIRC they were three or four lines each.
How about adding a little bit to one of the files in serialization/test to test these things. Send that along with your changes. Probably adding a couple of variables to A.hpp will be sufficient. Truth is, I don't even know how one goes about assigning a NaN or a +/inf to floating/double variable !! Robert Ramey
On Sun, 2005-11-13 at 16:02 -0800, Robert Ramey wrote:
Truth is, I don't even know how one goes about assigning a NaN or a +/inf to floating/double variable !!
Robert Ramey
I found this link which shows code from the libxml2 project. http://cvs.gnome.org/viewcvs/libxml2/trionan.c?rev=1.14 Its interesting to note the 'trio_pinf' function for setting a double to positive infinity. They have a function called 'trio_ninf' for setting a double to negative infinity. Finally 'trio_nan' for handling NaN. Is this what your looking for Robert? Stephen
On Sun, Nov 13, 2005 at 04:02:27PM -0800, Robert Ramey wrote:
uninitialized bools. floating/double NaN, +/- inf, etc.
We hit this problem early on. We're serializing things where NaN is overloaded to mean both NaN and "uninitialized", and where +/- inf are perfectly valid values. We often need to serialize structures between runs that contain sections that haven't been initialized. A couple of our platforms have exactly this bug.
I'll dig up my changes and send them to you. IIRC they were three or four lines each.
How about adding a little bit to one of the files in serialization/test to test these things. Send that along with your changes. Probably adding a
But of course, you get no code from me w/o tests. :)
couple of variables to A.hpp will be sufficient. Truth is, I don't even know how one goes about assigning a NaN or a +/inf to floating/double variable !!
I just had a thought: So we'd rather not know about how the stream stores these things, we just want it to work, round-trip. It should be as standard as possible but (IIUC) the standard is fuzzy here, or at least it's implementations are. So we're talking about code like (basic_text_iprimitive): void load(double & t) { if(is.fail()) boost::throw_exception(archive_exception(archive_exception::stream_error)); char c = is.peek(); while (c == ' ' || c == '\t' || c == '\n') // munch leading whitespace { is.get(); c = is.peek(); } if (c == 'n') // nan { t = NAN; // you can get one of those with 0.0/0.0, robert... if (is.get() != 'n' || is.get() != 'a' || is.get() != 'n') boost::throw_exception(archive_exception(archive_exception::stream_error)); return; if (c == 'i') // positive inf { //etc, etc Not real pretty and certainly not fast, if you want to read positive-infinity as "inf" and negative infinity as "-inf", (which is what comes out if you write them), you have to get and pushback that minus sign. Use case: I dump datastructures to XML with boost::serialization and then I want to pick through them later with some homebrew utility like a little python gui thing that makes graphs. I will have to understand serialization's strategy for reading/writing these cases and recode it myself... not good. Also, the save routine contains code like: os << std::setprecision(std::numeric_limits<float>::digits10 + 2); Which is right there in lexical_cast.hpp. In addition, in serialization you'll see code like: void load(unsigned char & t) { if(is.fail()) boost::throw_exception(archive_exception(archive_exception::stream_error)); unsigned short int i; is >> i; t = static_cast<unsigned char>(i); } Where the archive is getting pretty up close and personal with the types. Would it be too much to ask that lexical_cast<> handle these nan-type situations, and delegate the job to lexical_cast<> (maybe even for most or all PODs?) -t
troy d. straszheim wrote:
void load(double & t) { if(is.fail()) boost::throw_exception(archive_exception(archive_exception::stream_error)); char c = is.peek(); while (c == ' ' || c == '\t' || c == '\n') // munch leading whitespace { is.get(); c = is.peek(); } if (c == 'n') // nan { t = NAN; // you can get one of those with 0.0/0.0, robert... if (is.get() != 'n' || is.get() != 'a' || is.get() != 'n') boost::throw_exception(archive_exception(archive_exception::stream_error)); return; if (c == 'i') // positive inf { //etc, etc
note that text streams automatically skip white space, so this could be shortened considerably. Robert Ramey
On Mon, Nov 14, 2005 at 08:12:18AM -0800, Robert Ramey wrote:
troy d. straszheim wrote:
void load(double & t) { if(is.fail()) boost::throw_exception(archive_exception(archive_exception::stream_error)); char c = is.peek(); while (c == ' ' || c == '\t' || c == '\n') // munch leading whitespace { is.get(); c = is.peek(); } if (c == 'n') // nan { t = NAN; // you can get one of those with 0.0/0.0, robert... if (is.get() != 'n' || is.get() != 'a' || is.get() != 'n') boost::throw_exception(archive_exception(archive_exception::stream_error)); return; if (c == 'i') // positive inf { //etc, etc
note that text streams automatically skip white space, so this could be shortened considerably.
Did it arrive with the wrapping all screwed up like that? If so, sorry. Anyhow the question was if factoring that all out into lexical_cast<> (or something similar but located inside the serialization library) makes any sense. -t
On 11/13/05 1:26 PM, "Robert Ramey"
Daryle Walker wrote:
In this case, we would have a bug in the decoding and encoding routines. The bug would be that they don't match. If the coding routines are calling the standard library (like I think they are for text archives of primitive types), then the bug is from the standard library not being symmetric. I think the standard library is supposed to give symmetric text I/O
I don't know what the standard library is supposed to do. But the fact is that at least some implementations of he standard library are not handling text i/o symetically in at least two cases:
uninitialized bools. floating/double NaN, +/- inf, etc.
But it is never legal to push uninitialized variables through an output system, text or binary. Faulting the library for that is a severe stretch. It's not something that can be worked on, unlike the NaN case.
so how much effort should we do to work around such bugs?
actually, the effort for uninitialized bool is pretty trivial and I've incorporated and assertion into the appropriate spot.
But it's non-portable. Your environment lets you get away with it. What about the user of another environment where the uninitialized read does cause a crash & burn? What if the variable's bit pattern just happens to match a valid state? Sometimes the simplest solution isn't the best.
For the others, its a little more work. If someone has enough interest to actually make and test the changes, I'll be happy to receive them, check them, and incorporate them in to the code. I would expect that only some small changes in ??text_i/oprimitive would be necessary. Oh it would be a bad idea to post them to the list. Personally, I'm of the view trying to serialize a NaN would probably a be bug in user code. I'm aware that not everyone would agree with this. Maybe throwing an exception might be enabled/dissabled with another flag applied at archive open time (like no_header, etc). Any way, its not a big issue for me. Presumably, if someone has interest he can submit his improvements and we can discuss it then.
And you think serializing an uninitialized value isn't a bug?!
Reading from an uninitialized variable, like what could happen in the original case during encoding, is not a problem any library can fix. The programmer just has to be non-sloppy.
lol - and further more, if we catch him doing something like this we should make sure we don't tell him so he get his deserved punishment!!!!
It isn't a matter or "should," but "could." We cannot portably warn the user since undefined behavior doesn't have to play along with your "resolution" code.
I think in the case reported by Paul, he's not necessarily using the unitialised value, as its an object that is kind of like a discriminated union. I think this usage parallels the idea of NaN's etc in floating point. I'd expect these to be read back in too, as you suggested.
The problems are not in parallel. For a discriminated union, it is the responsibility of the coding author to determine which fields are active and only read/write those particular fields and skip the inactive fields. The unusability of NaN values is from a high-level perspective, such values are still valid objects from a low-level view. (And the high-level view is just an opinion; some programmers might want to keep NaNs around as a flag.)
This articulates my view. I used the term "overloading" as in semantic overloading where we might use NaN to mean something specific. I can see where this might be useful in some narrow contexts but I would generally consider it an error prone practice. Just one man's opinion.
The problem is that the "invalidity" of NaNs is at the semantic level. Objects with a NaN value are still valid objects at the base level. You would have to add some sort of censoring to your framework to make sure NaNs don't get through. And do you hard-wire this to IEEE floats, or generalize it to any type with "invalid" values? But the main invalidity test is that its I/O is asymmetric. Such a test would have to be hacked in for each environment (compiler, library, OS, and HW combination). Weren't we supposed to be writing less serialization code, not more? -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
"Robert Ramey"
David Abrahams wrote:
* when booleans are output, booleans are converted to integers and an assertion is thrown if the resulting value is other than zero or one.
What do you mean, "an assertion is thrown?" Only exceptions can be thrown.
I mean the assert macro is passed a value of false
This is in line with my view of trapping a program as soon as possible when any kind of programmign error has been detected.
Some kinds of programming errors aren't worth trying to detect. Passing uninitialized data is fundamental brokenness, and looking for it with only in bools is rather silly.
Hmm - here I have a situation where I can trap an error commited when an archive is saved that would not otherwise be detected until the archive is loaded. The fact I can't trap it for all data types doesn't mean one should pass the opportunity up for bools. Since its an assert, it will have no detrimental effect on runtime of release build code. What is the downside of doing this?
Not much: - it's something you have to maintain - it may lead users into a false sense of security.
it would be an unfortunate choice. Non-signalling NaNs should be serializable.
Hmm - that sounds like a matter of opinion to me.
I guess if you hold the opposite opinion it's a matter of opinion. Nonetheless, it's true. NaNs are not just invalid numbers. They can be used intentionally (we had to represent "no data" in a time series for one client, and NaNs worked like a charm for that purpose), and they are meaningful results of some floating calculations.
This would be a good thing from my standpoint as up until now a NaN could be saved but not recovered as the standard text stream input chokes on the Nan Text.
Seems like you should work around that problem.
If it indeed it is a problem.
I don't know what to say. If I, as a client of your library, say the lack of this functionality would represent a problem for me, and you doubt the truth of it, it seems the discussion is over.
Meanwhile the binary input just loads the bits whatever they were. This conclicts with my goal of making all archives behave alike.
...
I'm concerned that someone standup and show "But NaN is a valid value for a float or double and I should be able to serialize that!!!".
Damn straight.
I'm inclined to reject this characterisation basically because doing so will make my life easier. This will trap some user's errors at the cost of prohibiting some behavior that could be defended as correct.
What I'm curious about is what is the utility of serializing a NaN? Why would someone want to do this?
So that when they deserialize, they get the same data back that they started with. It seems pretty obvious to me.
What does it usually mean?
It could mean a variety of things, including that someone divided zero by zero.
The only thing that occurs to me is that it would be uninitialized data.
It's *highly* unlikely that any given random sequence of bits will turn out to be a valid NaN.
If NaN has been overloaded with some sort of meaning like "undetermined value" or something like that
Overloaded? What do you mean by that?
I would think its a questionable and error prone practice.
What is a questionable and error prone practice?
If that's the case, I don't see its a bad thing if the libray fails to support it.
This seems to be an area in which you have little or no experience. It seems as though because you don't really understand NaNs, you're loathe to support them. Why not trust the people that actually know how NaNs work and can be used instead?
So far only one user has raised the issue of having serialized a NaN and having it trap on reading back the archive. I don't read a whole lot into this as this would only occur in text and xml archives and perhaps others who do this are using binary archives. But it does suggest that this isn't a huge issue for people actually using the library.
Maybe that's because people in the numerics community, who use NaNs, have other needs not addressed by the library that make rapid adoption unlikely.
I find it very distressing that you keep dismissing the needs of the numerics community.
I have not done this. And I resent the accusation that I have.
Your responses to Matthias about fast array serialization certainly gave me the impression that you're brushing off valid arguments in a very cavalier manner. Further, IIRC, you ignored his last question: Shall the serialization library documentation encourage the implementors of those functions to make use of array serialization features when available, or not?
This is likely to lead to something very distasteful, like a semi-official branch of the serialization library that works the way we need it to.
Hmm - I can't imagine why anyone would want to do that. Someone might want to make thier own derivation(s) of one or more archive classes or even a whole new archive class - but that doesn't represent any conflict with the current library.
We are working on a library for parallel computing that uses the serialization library to ship data via MPI. If you reject the proposed integration of fast array serialization into the library, we will need to encourage 3rd-party authors of serialization for types containing arrays to use some non-standard fast mechanism, or clients of our library will complain to us that performance with these 3rd-party types is unacceptable, and there will be no clean way to fix them after the fact. Every way we can think of to accomplish that eventually leads to something that feels like "hijacking" the library. We _really_ don't want to do that. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
Your responses to Matthias about fast array serialization certainly gave me the impression that you're brushing off valid arguments in a very cavalier manner. Further, IIRC, you ignored his last question:
Shall the serialization library documentation encourage the implementors of those functions to make use of array serialization features when available, or not?
This is likely to lead to something very distasteful, like a semi-official branch of the serialization library that works the way we need it to.
Hmm - I can't imagine why anyone would want to do that. Someone might want to make thier own derivation(s) of one or more archive classes or even a whole new archive class - but that doesn't represent any conflict with the current library.
We are working on a library for parallel computing that uses the serialization library to ship data via MPI. If you reject the proposed integration of fast array serialization into the library, we will need to encourage 3rd-party authors of serialization for types containing arrays to use some non-standard fast mechanism ...
of course this is what I haven't seen. I would expect that a "fast...archive" would include the specific functionality for certain types. Then users of those types with those archives would automatically benefit from the specialized functionality while preserving compatibility with other archive types. this has been done before - for example strings are serialized differently in some archives as opposed to others. I havn't seen anythiing differentabout this situation.
Every way we can think of to accomplish that eventually leads to something that feels like "hijacking" the library. We _really_ don't want to do that.
OK - I'll take another look at mattias code and see what can be done. Robert Ramey
"Robert Ramey"
We are working on a library for parallel computing that uses the serialization library to ship data via MPI. If you reject the proposed integration of fast array serialization into the library, we will need to encourage 3rd-party authors of serialization for types containing arrays to use some non-standard fast mechanism ...
of course this is what I haven't seen.
What haven't you seen?
I would expect that a "fast...archive" would include the specific functionality for certain types.
Like what types? double for example? What interface would you propose for serializing an array of doubles?
Then users of those types with those archives would automatically benefit from the specialized functionality
Not unless the authors of serialize functions all use the specialized functionality when it's available.
while preserving compatibility with other archive types. this has been done before - for example strings are serialized differently in some archives as opposed to others. I havn't seen anythiing differentabout this situation.
Strings are a closed set of types. In this situation we have a large _category_ of otherwise-unrelated types that can benefit from one particular optimization. But that benefit can only accrue if the authors of serialization functions use a specialized interface for array serialization whenever possible.
Every way we can think of to accomplish that eventually leads to something that feels like "hijacking" the library. We _really_ don't want to do that.
OK - I'll take another look at mattias code and see what can be done.
Thanks, I really appreciate it. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
"Robert Ramey"
writes: David Abrahams wrote:
This would be a good thing from my standpoint as up until now a NaN could be saved but not recovered as the standard text stream input chokes on the Nan Text.
Seems like you should work around that problem.
If it indeed it is a problem.
I don't know what to say. If I, as a client of your library, say the lack of this functionality would represent a problem for me, and you doubt the truth of it, it seems the discussion is over.
I haven't yet read the later messages, but from my point of view (a typical user), the bottom line *must* be: 1. If you can legitimately write it to an archive, you *must* be able to read it it back in. If not, the archive is useless, and the library is buggy. 2. Conversely, if you cannot legitimately read it, you must *not* allow it to be written out - throw an exception or handle it in some other way, but you *must* advise the user somehow. If not, the library is buggy. This applies to NaNs as much as to uninitialised booleans or anything else. The weight of argument so far seems to be that allowing NaNs to be written and read is a good thing. I'm sure it wouldn't be too hard to get the deserialisation code to parse NaN as a legitimate value for a floating-point variable. Paul
Paul Giaccone wrote:
I haven't yet read the later messages, but from my point of view (a typical user), the bottom line *must* be:
1. If you can legitimately write it to an archive, you *must* be able to read it it back in. If not, the archive is useless, and the library is buggy. 2. Conversely, if you cannot legitimately read it, you must *not* allow it to be written out - throw an exception or handle it in some other way, but you *must* advise the user somehow. If not, the library is buggy.
I've now read the rest of the thread and appreciate that the above requirements might be too idealistic and not necessarily practicable on all operating systems. In any case, however this issue is resolved practically, I think it is important that it goes into the user documentation, including whether or not the behaviour is supported but also on which platforms, if not all of them. I see that there is already discussion under "Archive Exceptions" of what can cause each of the exceptions, which is great. Could you add to the discussion of "stream_error" that the user should check to see that all archived variables are initialised? This would be useful, and it throws the responsibility back to the user. Thanks, Paul
On 11/14/05 4:45 AM, "Paul Giaccone"
I haven't yet read the later messages, but from my point of view (a typical user), the bottom line *must* be:
1. If you can legitimately write it to an archive, you *must* be able to read it it back in. If not, the archive is useless, and the library is buggy. 2. Conversely, if you cannot legitimately read it, you must *not* allow it to be written out - throw an exception or handle it in some other way, but you *must* advise the user somehow. If not, the library is buggy.
(The following is only for initialized objects.) But how do you implement this? We would have to implement some sort of censoring hook to catch the "bad" values. What if you really wanted that value serialized anyway? How do you know that a value is unserializable? The main factor seems to be the I/O system's quality, not the object's type. How do we deal with the fact that said quality may vary per environment, so each computer may censor different values? Each object serialized now gets an extra "if"-branch, to execute the censor object; how much will the "if" check, for both approved and rejected values, slow down the streaming?
This applies to NaNs as much as to uninitialised booleans or anything else. The weight of argument so far seems to be that allowing NaNs to be written and read is a good thing. I'm sure it wouldn't be too hard to get the deserialisation code to parse NaN as a legitimate value for a floating-point variable.
The issues aren't really the same. Uninitialized objects are a red herring; manipulating them, besides initialization, is illegal. The fact that the objects are "bool" is another red herring; the type doesn't matter. This kind of error can't be portably checked, and the energy towards this effort should be directed away from giving this technique any smell of legitimacy and toward improving programmers that use bad techniques. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
[I was going to write a similar response, but decided to crib off this one
instead. (So I generally agree with Mr. Abrahams.)]
On 11/10/05 4:53 PM, "David Abrahams"
"Robert Ramey"
writes: Nigel Rantor wrote:
If Robert can find a nice way of trapping uninitialised variables (not simply bools of course) and throw exceptions for these then that's great. I'm not sure that is entirely possible though. (and I haven't psent any time thinking about it either)
I checked my reference on the subject of bool/int conversion. This is Section 4.2 Booleans in "The C++ Programming Language by Stroustrup.
I see that "in arithmetic or logical expressions, bools are converted to ints... if the result is converted back to bool, a 0 is converted to false and a non-zero value is converted to true".
Looking at the above
Do not look at the above. It contradicts the standard, as shown:
5.14 Logical AND operator 5 Expressions
1 The && operator groups left-to-right. The operands are both implicitly converted to type bool (clause 4)...
And the conditions in the quote are ambiguous. There's really arithmetic, shift, relational, bit-wise, and logical categories to consider. For all operators but logical, any "bool" argument gets promoted to "int." For the logical operators, any non-Boolean argument gets converted to "bool." As far as the result is concerned, the relational and logical operators DIRECTLY return a "bool," it isn't "converted back." (The result of the other operators match whatever the [first] argument(s) is, after any promotions.)
You don't think compiler implementors refer to TC++PL when deciding how to write their compilers, do you?
and having considered the postings on the thread my inclination is to do the following:
* when booleans are output, booleans are converted to integers and an assertion is thrown if the resulting value is other than zero or one.
The compiler can only give 0 or 1, it cannot canonically give you another result. As I said in another post, trying this on an uninitialized "bool" has already given you undefined behavior, so you don't know if you'll reach your assertion code, let alone trigger it. (Also the 0 or 1 from the conversion doesn't tell you anything from the internal representation.)
What do you mean, "an assertion is thrown?" Only exceptions can be thrown.
This is in line with my view of trapping a program as soon as possible when any kind of programmign error has been detected.
Some kinds of programming errors aren't worth trying to detect. Passing uninitialized data is fundamental brokenness, and looking for it with only in bools is rather silly.
I agree, especially since I think that such detection isn't possible.
This naturally suggests the following for floats and doubles.
Nothing natural about the following, since the logic was flawed.
* any attempt to save floats or doubles for which the result of isnan(..) is false will trap with an assertion.
So, you're going to assert whenever someone tries to save a non-NaN? That sounds pretty useless. Even if you meant the opposite, it would be an unfortunate choice. Non-signalling NaNs should be serializable.
A not-a-number value is not the same as an uninitialized one. From the perspective of the C++ object model, a built-in floating-point variable having a not-a-number state is still considered initialized. (There may be bit patterns that don't match any valid state for a float, not even NaN.)
This would be a good thing from my standpoint as up until now a NaN could be saved but not recovered as the standard text stream input chokes on the Nan Text.
Seems like you should work around that problem.
Another post in this thread mentioned another thread from a year ago. That thread said that some environments do provide I/O that can save and load NaN (and infinite) values to/from text. Mr. Ramey was concerned with the environments that couldn't. At this point, I'll propose not worrying about such environments, even if they're popular. You can't give tremendous effort to support all the brokenness out there (and I hope that we don't expect you to). Sometimes you have to tell the user that they're out of luck with that particular configuration. And, AFAIK, not even text archives are 100% portable between environments, are they?
Meanwhile the binary input just loads the bits whatever they were. This conclicts with my goal of making all archives behave alike.
Good, then I'll (foolishly) assume you'll make NaNs serializable everywhere...
I'm concerned that someone standup and show "But NaN is a valid value for a float or double and I should be able to serialize that!!!".
Damn straight.
I'm inclined to reject this characterisation basically because doing so will make my life easier. This will trap some user's errors at the cost of prohibiting some behavior that could be defended as correct.
When I said that sometimes you have to give up, I only mean in terms of environments too broken to give you the help you need. You shouldn't (permanently) drop features just to make your "life easier" if they're the right thing to do.
I find it very distressing that you keep dismissing the needs of the numerics community. This is likely to lead to something very distasteful, like a semi-official branch of the serialization library that works the way we need it to.
Also, this route needs (yet another) special case and more code, and I think we should minimize that. (And remember that not all floats are IEEE, or even have NaNs!) Anyway, do we provide a way for a (user-created) serialization routine to bail if the archive gives bad values (like it's corrupt or from another platform)? -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle Walker wrote:
* when booleans are output, booleans are converted to integers and an assertion is thrown if the resulting value is other than zero or one.
The compiler can only give 0 or 1, it cannot canonically give you another result.
Hmmm - well its doing so in this case - canonically or not?
As I said in another post, trying this on an uninitialized "bool" has already given you undefined behavior, so you don't know if you'll reach your assertion code, let alone trigger it. (Also the 0 or 1 from the conversion doesn't tell you anything from the internal representation.)
Hmmm - suppose I implement the following code: bool tf; int i = tf; assert(0 == i || 1 == i); and the assertion is invoked. Is it not correct to conclude that there is a problem with the code and that it should be somehow altered? Of course a smart compiler will catch this sooner. In other cases a smart runtime system/machine might throw a data exception the instance that i=tf is executed. Great - We've trapped a bug in user code. But suppose that doesn't happen and the exception is invoked. Well, great again - for the same reason. Suppose that the assertion isn't invoked (may the compler "helps us out" by initializing tf. So we're stuck with a bug - same as before. So the assert might not be necessary, it might detect a bug not otherwise detected. The fact it might not on some platforms doesn't mean that it shouldn't be included.
Some kinds of programming errors aren't worth trying to detect. Passing uninitialized data is fundamental brokenness, and looking for it with only in bools is rather silly.
I agree, especially since I think that such detection isn't possible.
same old question - what's the downside with detecting fundamental brokeness - sounds like a good feature to me !!!
Another post in this thread mentioned another thread from a year ago. That thread said that some environments do provide I/O that can save and load NaN (and infinite) values to/from text. Mr. Ramey was concerned with the environments that couldn't. At this point, I'll propose not worrying about such environments, even if they're popular. You can't give tremendous effort to support all the brokenness out there (and I hope that we don't expect you to). Sometimes you have to tell the user that they're out of luck with that particular configuration.
lol - well, you've got my vote. it turns out that this is an artifact of certain (all?) text streams. So it doesn't appear in the binary archives. Perhaps this is why it hasn't come up more often.
And, AFAIK, not even text archives are 100% portable between environments, are they?
they are - or least they are meant to be. Any instance where they are not should be reported as a bug. Having said that, there are going to be some cases where they ar not going to be portable. Suppose the following: a) one serializes a variable of type size_t to a text archive. b) value exceeds 32 significant bits. c) the archive is shipped to another machine which has size_t only 32 bits long. As he archive is read by recieving machine. An error should be detected and an exception thrown when the value is read. So although text archives are portable, one can still only reconstruct C++ data which is in fact representable on the target machine. I don't know if that decreases you 100% number - but its the best we can hope for.
When I said that sometimes you have to give up, I only mean in terms of environments too broken to give you the help you need. You shouldn't (permanently) drop features just to make your "life easier" if they're the right thing to do.
The question is how much effort should be directed working aournd a library bug (if that's what it is) to a case which is so un-important or trivial to address that even those people directly effected by it are dis-inclined to invest any effort in it. Its not that hard to fix if someone really needs it. So if it bothers you and you need to fix it, test it, and send me your changes.
Anyway, do we provide a way for a (user-created) serialization routine to bail if the archive gives bad values (like it's corrupt or from another platform)?
Archive class implementation throw documented exceptions in cases where corrupted data is detected. Serialization functions are totally in hands of the user which implements them and can also throw any of these exceptions - or its own if it prefers. Robert Ramey
Robert Ramey wrote:
Daryle Walker wrote:
As I said in another post, trying this on an uninitialized "bool" has already given you undefined behavior, so you don't know if you'll reach your assertion code, let alone trigger it. (Also the 0 or 1 from the conversion doesn't tell you anything from the internal representation.)
Hmmm - suppose I implement the following code:
bool tf; int i = tf; assert(0 == i || 1 == i);
and the assertion is invoked. Is it not correct to conclude that there is a problem with the code and that it should be somehow altered?
As Daryle and others have pointed out, citing the C++ standard, accessing an uninitialized bool variable gives one undefined behavior. In the face of that, why do you think that the above 3 lines of code mean anything ? Because in X compiler it may work to trigger the assert ? That is not the way to program C++ in this case. The assert above is meaningless. Whether it occurs or not is completely random. A program could pass an unitialized bool variable for serialization and if you used the code above to test for an uninitialized bool it would only prove that if some compiler set an unitialized bool variable to a value which did not have a bit pattern of 0 or 1 in that particular situation, the assert would occur. In the meantime if that same compiler, in some other situation having to do with unitialized bools, or any other compiler in any situation dealing with uninitialized bools, set the value to have a bit pattern of 0 or 1, the assert would not occur even though the bool would still be uninitialized. Knowing this, what would be the point of such code ? Quite simply there is no knowing what the bit pattern of an unitialized bool is, so the assert above is just a waste of time. If there were a way of testing for an unitialized bool in C++ I could see your doing so as a help to programmers who erroneously pass one to the serialization library, although I still think this sort of error is outside the bounds of your library, just as checking for an unitialized variable of any data type is outside the bounds of your library. But since there is no reliable way of doing this, it should not be done.
On 11/13/05 1:59 PM, "Robert Ramey"
Daryle Walker wrote:
* when booleans are output, booleans are converted to integers and an assertion is thrown if the resulting value is other than zero or one.
The compiler can only give 0 or 1, it cannot canonically give you another result.
Hmmm - well its doing so in this case - canonically or not?
I was just talking about defined behavior, not the not-0-nor-1 case you got from undefined behavior. A footnote in the standard mentions that examining a bool in an undefined manner may give neither-True-nor-False states.
As I said in another post, trying this on an uninitialized "bool" has already given you undefined behavior, so you don't know if you'll reach your assertion code, let alone trigger it. (Also the 0 or 1 from the conversion doesn't tell you anything from the internal representation.)
Hmmm - suppose I implement the following code:
bool tf; int i = tf; assert(0 == i || 1 == i);
and the assertion is invoked. Is it not correct to conclude that there is a problem with the code and that it should be somehow altered? Of course a smart compiler will catch this sooner. In other cases a smart runtime system/machine might throw a data exception the instance that i=tf is executed. Great - We've trapped a bug in user code. But suppose that doesn't happen and the exception is invoked. Well, great again - for the same reason. Suppose that the assertion isn't invoked (may the compler "helps us out" by initializing tf. So we're stuck with a bug - same as before. So the assert might not be necessary, it might detect a bug not otherwise detected. The fact it might not on some platforms doesn't mean that it shouldn't be included.
How is the user supposed to know if they're on an environment with your "aid" or one where the aid doesn't work? Are you going to make (and maintain) a list?
Some kinds of programming errors aren't worth trying to detect. Passing uninitialized data is fundamental brokenness, and looking for it with only in bools is rather silly.
I agree, especially since I think that such detection isn't possible.
same old question - what's the downside with detecting fundamental brokeness - sounds like a good feature to me !!! [TRUNCATE]
Not if it doesn't work reliably and/or gives the user false hopes. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle Walker wrote:
same old question - what's the downside with detecting fundamental brokeness - sounds like a good feature to me !!! [TRUNCATE]
Not if it doesn't work reliably and/or gives the user false hopes.
Ahhh - here is the crux of the matter. Frequently, I can detect a user error. Sometimes I can't . The fact that sometimes I can't detect it does not argue that I shouldn't raise a flag when I do. Let it be known that: For anyone with the hope that the serialization library can prevent you from creating an erroneas archive - these are false hopes. Robert Ramey
Robert Ramey wrote:
Let it be known that:
For anyone with the hope that the serialization library can prevent you from creating an erroneas archive - these are false hopes.
Right, so could you make this known in the documentation, along with dicussion of the cases when this can occur, in order to help users with use of the library and debugging of their code, please? Paul
Paul Giaccone
Robert Ramey wrote:
Let it be known that:
For anyone with the hope that the serialization library can prevent you from creating an erroneas archive - these are false hopes.
Right, so could you make this known in the documentation, along with dicussion of the cases when this can occur, in order to help users with use of the library and debugging of their code, please?
Suppose I write a File class. Do I need to say explicitly that if you write random bites into the middle of a File instance that the class can't protect you from corrupting your filesystem? That's the same sort of thing as passing uninitialized data when an interface isn't documented as accepting raw storage; it's always wrong. No component that expects a reference to T should ever have to document that the T must be initialized. IMO, that would be silly. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
Paul Giaccone
writes: Robert Ramey wrote:
Let it be known that:
For anyone with the hope that the serialization library can prevent you
from creating an erroneas archive - these are false hopes.
Right, so could you make this known in the documentation, along with dicussion of the cases when this can occur, in order to help users with use of the library and debugging of their code, please?
Suppose I write a File class. Do I need to say explicitly that if you write random bites into the middle of a File instance that the class can't protect you from corrupting your filesystem?
That's the same sort of thing as passing uninitialized data when an interface isn't documented as accepting raw storage; it's always wrong. No component that expects a reference to T should ever have to document that the T must be initialized. IMO, that would be silly.
No, of course there is no need to say this. The typical programmer is familiar with the concept of GIGO. If you don't initialise your data, you can't expect your program to work, simple as that. Saying "Don't forget to initialize your data before you write it to the archive, guys!" would be silly, as you say. However, this is not what I am requesting. What I am asking for is a note in the documentation on the page headed "Serialization - Archive Exceptions" under "stream_error" that says something along the lines of "This exception can also occur when reading an archive to which uninitialized data has been written. I/O of uninitialized data is undefined." The idea is not to treat programmers with kid gloves by reminding them to make sure they remember to initialise their variables, but rather to point out to the user another reason why a stream_error might occur (note I say "might", because it seems that there are cases in which uninitialised data written to an archive can in fact be read back in harmlessly, even though this is deprecated) and so assist them with debugging their code. I don't see it as telling users what they should know already; it's simply listing another situation (along with not having a terminating new line or not destroying an output archive on a stream before opening an input one on the same stream) that can cause a stream_error exception to be thrown. Here's why I think this is important. Before using the library, I had written my own serialisation code, and everything seemed to be working fine, even though I had uninitialised variables in my code that I was unaware of at the time. Then I used the serialization library and got this error. This suggested to me that either there was something wrong with the way I was using the library or there might be a bug in the library, rather than suggesting a bug in my code. Looking through the documentation, the way I was using the library seemed to be fine, so I started this thread (which seems to be chasing its own tail rather than coming to any practical conclusions). A note to the effect I am proposing would have enabled me to find the problem much more quickly, and would no doubt assist other puzzled users too. Thanks, Paul
Paul Giaccone wrote:
However, this is not what I am requesting. What I am asking for is a note in the documentation on the page headed "Serialization - Archive Exceptions" under "stream_error" that says something along the lines of "This exception can also occur when reading an archive to which uninitialized data has been written. I/O of uninitialized data is undefined."
Oh - OK - that's easy. Consider it done ! Robert Ramey
Robert Ramey wrote:
Paul Giaccone wrote:
However, this is not what I am requesting. What I am asking for is a note in the documentation on the page headed "Serialization - Archive Exceptions" under "stream_error" that says something along the lines of "This exception can also occur when reading an archive to which uninitialized data has been written. I/O of uninitialized data is undefined."
Oh - OK - that's easy. Consider it done !
Great! Thanks very much. Hopefully we can put this thread out of its misery now :-) Paul
"Robert Ramey"
Paul Giaccone wrote:
However, this is not what I am requesting. What I am asking for is a note in the documentation on the page headed "Serialization - Archive Exceptions" under "stream_error" that says something along the lines of "This exception can also occur when reading an archive to which uninitialized data has been written. I/O of uninitialized data is undefined."
Oh - OK - that's easy. Consider it done !
Please, don't use an exception for this purpose, and if for some reason you feel you *must*, please don't document it as such. My posts in http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/800... describe some of the rationale. -- Dave Abrahams Boost Consulting www.boost-consulting.com
Paul Giaccone
However, this is not what I am requesting. What I am asking for is a note in the documentation on the page headed "Serialization - Archive Exceptions" under "stream_error" that says something along the lines of "This exception can also occur when reading an archive to which uninitialized data has been written.
Undefined behavior can induce any effect at all, so of course it can result in a stream_error exception. However, if the program is going to try to intentionally respond to uninitialized data it should NOT be with an exception, but with an assertion. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
Paul Giaccone
writes: However, this is not what I am requesting. What I am asking for is a note in the documentation on the page headed "Serialization - Archive Exceptions" under "stream_error" that says something along the lines of "This exception can also occur when reading an archive to which uninitialized data has been written.
Undefined behavior can induce any effect at all, so of course it can result in a stream_error exception. However, if the program is going to try to intentionally respond to uninitialized data it should NOT be with an exception, but with an assertion.
I just added a note saying that one possible cause of a stream error would be an attempt to store uninitialized data. I had already included the assertion. I hope that can satisfy everyone. Robert Ramey
Nigel Rantor
My understanding of uninitialised variables is that their *values* were undefined, that you could not rely on them to be any particular value, including not being within range for that type.
So, you can read them, but there are no guarantees about what you'll get back. I suppose I agree, it really isn't a bool yet unless you can be assured that it contains true or false, but it does have a value.
No. 4.1 Lvalue-to-rvalue conversion 1 An lvalue (3.10) of a non-function, non-array type T can be converted to an rvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the lvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior. 5 Expressions [expr] 8 Whenever an lvalue expression appears as an operand of an operator that expects an rvalue for that operand, the lvalue-to-rvalue (4.1), array-to-pointer (4.2), or function-to-pointer (4.3) standard conversions are applied to convert the expression to an rvalue. I don't have time to run down everything relevant right now, but lvalue-to-rvalue conversion happens at the drop of a hat. I'm pretty sure if you trace through it, you'll find out that anything you can do that reads the value of an uninitialized variable will go through an lvalue-to-rvalue conversion. Not to mention Footnote 42, which though non-normative speaks directly to this question: 42) Using a bool value in ways described by this International Standard as ``undefined,'' such as by examining the value of an uninitialized automatic variable, might cause it to behave as if is neither true nor false.
<caveat> C++ isn't my day job...I just use it for fun things... </caveat>
When bools are used in logical operations they are converted to integers
Can you cite the standard on that one? I'm pretty sure that it's the other way around: when integers are used in logical operations they are converted to bools.
No, but I read it in the fourth para of sec 4.2 of TCPPPL.
You should get a copy of the standard.
"In arithmetic and logical expressions, bools are converted to ints; integer arithmetic and logical operations are performed on the converted values."
5.14 Logical AND operator 5 Expressions 1 The && operator groups left-to-right. The operands are both implicitly converted to type bool (clause 4)...
, so depending on what your bool happens to contain before initialisation it could evaluate to either true or false.
Or it could crash your computer.
Computer or program?
Yes.
Really? Please elaborate, I'm interested.
Undefined behavior means anything can happen. Really. 1.3.12 undefined behavior behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements. ^^^^^^^^^^^^^^^
I don't see how accessing a piece of memory that hasn't been initialised to a legal value would cause that.
It depends on how the platform works. That's allowed under the C++ standard.
Well, how about simply treating anything other than 1 as false?
I don't know what 1 has to do with anything. The values of a bool are true and false. 1 is an int.
I was simply using the OP's terminology, I apologise. s/1/true/ in my above sentence. :-)
Since the two valid values of a bool are true and false, treating anything other than true as false is tautological. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
Nigel Rantor
writes: No, but I read it in the fourth para of sec 4.2 of TCPPPL.
You should get a copy of the standard.
Yep. Any suggestions welcome, is there one canonical reference I should be aware of? Regards, n
Nigel Rantor
David Abrahams wrote:
Nigel Rantor
writes: No, but I read it in the fourth para of sec 4.2 of TCPPPL.
You should get a copy of the standard.
Yep. Any suggestions welcome, is there one canonical reference I should be aware of?
?? The C++ Standard is the canonical reference http://www.jamesd.demon.co.uk/csc/faq.html#B1 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470846747,descCd-tableOf... -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
Nigel Rantor
writes: Yep. Any suggestions welcome, is there one canonical reference I should be aware of?
?? The C++ Standard is the canonical reference
http://www.jamesd.demon.co.uk/csc/faq.html#B1 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470846747,descCd-tableOf...
*sigh* Yes, I thought I was being clear, apparently not. I meant out of ALL of the possible references out there is there one above all others that people would recommend. Implicit in this is that all reference books are NOT created equal. Which of the above two would you reccomend and why? Any idea regarding the extreme difference in price of them? n
Nigel Rantor
David Abrahams wrote:
Nigel Rantor
writes: Yep. Any suggestions welcome, is there one canonical reference I should be aware of?
?? The C++ Standard is the canonical reference
http://www.jamesd.demon.co.uk/csc/faq.html#B1 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470846747,descCd-tableOf...
*sigh*
Yes, I thought I was being clear, apparently not. I meant out of ALL of the possible references out there is there one above all others that people would recommend.
?? There is only one C++ standard.
Implicit in this is that all reference books are NOT created equal.
Which of the above two would you reccomend and why?
?? They are precisely identical in content. The first one comes as a PDF for $18, and the 2nd comes as hardcover book. I like having both at hand because it's nice to be able to search.
Any idea regarding the extreme difference in price of them?
It costs a lot more to print a book than to duplicate a PDF? -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
Nigel Rantor
writes: David Abrahams wrote:
Nigel Rantor
writes: Yep. Any suggestions welcome, is there one canonical reference I should be aware of?
?? The C++ Standard is the canonical reference
http://www.jamesd.demon.co.uk/csc/faq.html#B1 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470846747,descCd-tableOf...
*sigh*
Yes, I thought I was being clear, apparently not. I meant out of ALL of the possible references out there is there one above all others that people would recommend.
?? There is only one C++ standard.
Good lord man, must you be so obtuse? I was merely asking if one was better. If you can't understand that some reference material is of better quality, contains superior examples or includes more engaging sections on rationale/design then I would much rather you say that they are all equal or say nothing at all rather than imply that I beleive there to be multiple standards.
Any idea regarding the extreme difference in price of them?
It costs a lot more to print a book than to duplicate a PDF?
No, one comes either as a book for $175 OR as a PDF for $18. The other is only available as a book for £34.00. I was wondering why/how they can get away with $175... You know what forget it, it isn't that important. n
Nigel Rantor
David Abrahams wrote:
Nigel Rantor
writes: David Abrahams wrote:
Nigel Rantor
writes: Yep. Any suggestions welcome, is there one canonical reference I should be aware of?
?? The C++ Standard is the canonical reference
http://www.jamesd.demon.co.uk/csc/faq.html#B1 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470846747,descCd-tableOf...
*sigh*
Yes, I thought I was being clear, apparently not. I meant out of ALL of the possible references out there is there one above all others that people would recommend.
?? There is only one C++ standard.
Good lord man, must you be so obtuse?
I think you'd better take a deep breath. You are failing to grasp the basic facts.
I was merely asking if one was better. If you can't understand that some reference material is of better quality, contains superior examples or includes more engaging sections on rationale/design
No, I am telling you, they are **precisely** identical, aside from scale. You have to blow up the pages of the hardcover book slightly to make them match the pages in the PDF, but aside from that, they are page-for-page and word-for-word identical, down to the smallest details of formatting and layout.
then I would much rather you say that they are all equal or say nothing at all rather than imply that I beleive there to be multiple standards.
There is only one document.
Any idea regarding the extreme difference in price of them?
It costs a lot more to print a book than to duplicate a PDF?
No, one comes either as a book for $175 OR as a PDF for $18. The other is only available as a book for £34.00.
Oh, AFAIK the $175 is due to the fact that ANSI can't publish books efficiently and it uses the purchase of hardcover standards to keep the organization running.
I was wondering why/how they can get away with $175...
The book from Wiley only came out a year or two ago; for a while the ANSI document was the only choice.
You know what forget it, it isn't that important.
Well, I did try to help. I'm sorry if it hasn't worked out for you. -- Dave Abrahams Boost Consulting www.boost-consulting.com
On 11/12/05 12:21 PM, "David Abrahams"
Nigel Rantor
writes: David Abrahams wrote:
Nigel Rantor
writes: [SNIP] Any idea regarding the extreme difference in price of them? ["them" is the C++ standard as ANSI's book v. ANSI's PDF v. Wiley's book]
It costs a lot more to print a book than to duplicate a PDF?
No, one comes either as a book for $175 OR as a PDF for $18. The other is only available as a book for £34.00.
Oh, AFAIK the $175 is due to the fact that ANSI can't publish books efficiently and it uses the purchase of hardcover standards to keep the organization running.
Worse, I think the _authors_ also have to pay for the privilege to create standards! Only the publishers-in-the-middle ANSI & ISO get any money. Both readers and writers have to spend it. (Also, PDFs made from a scanning of a book form keep the original price [i.e. expensive, crappy-looking, and unusable]. So the early C standard PDFs were really expensive, until the 1999 version was written electronically. Then the price lowered to the similarly-created C++ standard's price.)
I was wondering why/how they can get away with $175...
The book from Wiley only came out a year or two ago; for a while the ANSI document was the only choice. [TRUNCATE]
-- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Robert Ramey wrote:
Paul Giaccone wrote:
For booleans, though, a value of other than 0 or 1 means it has not been initialised, and perhaps this should throw an exception on writing to the archive rather than on reading from it.
Hmmm, I'm not sure about this. Do we know for a fact that a bool variable will always contain 1 or 0? I've never seen code trap on an un-initialized bool. It seems that even an uninitialized bool corresponds to true or false.
Is requiring the value to be 0 or 1 part of the C++ ANSI standard? If you want to try to reproduce the error, the code threw an exception on an object that was allocated on the heap and contained a boolean, which I had not initialised.
Perhaps part of the problem is that I used 0 and 1 for bool variable in order to not included english strings "true" and "false" in text files and to save space.
This makes sense because 0 and 1 are probably what users would expect. Is it however possible to do the equivalent of boolalpha on the stream in order to write and read booleans as strings?
I'll think about this.
Robert Ramey
If handling uninitialised variables is not practical, then perhaps there could be a warning in the documentation that uninitialised booleans will cause stream errors on deserialisation.
Paul
Paul Giaccone
Robert Ramey wrote:
Paul Giaccone wrote:
For booleans, though, a value of other than 0 or 1 means it has not been initialised, and perhaps this should throw an exception on writing to the archive rather than on reading from it.
Hmmm, I'm not sure about this. Do we know for a fact that a bool variable will always contain 1 or 0? I've never seen code trap on an un-initialized bool. It seems that even an uninitialized bool corresponds to true or false.
Is requiring the value to be 0 or 1 part of the C++ ANSI standard?
No, a bool has a value of true or false. 0 and 1 are integer values.
If you want to try to reproduce the error, the code threw an exception on an object that was allocated on the heap and contained a boolean, which I had not initialised.
All bets are off then; you have no right to complain about anything that happens afterwards. When an object containing an uninitialized member is copied (as occurs when an exception is thrown), you get undefined behavior.
Perhaps part of the problem is that I used 0 and 1 for bool variable in order to not included english strings "true" and "false" in text files and to save space.
ar << (b ? 't' : 'f');
This makes sense because 0 and 1 are probably what users would expect. Is it however possible to do the equivalent of boolalpha on the stream in order to write and read booleans as strings?
-- Dave Abrahams Boost Consulting www.boost-consulting.com
[Sorry for the late response.]
BTW, please put in an actual subject relating your specific concern. All I
see in my e-mail client are our tags.
On 11/9/05 1:02 PM, "Paul Giaccone"
I am serialising data structures that include objects of the form:
std::vector
where MyClass contains simple times and further classes.
This would not deserialise correctly in Boost 1.33.0 when in XML format, and I think there was discussion of there being a bug in deserialisation of vectors in that version of the serialization library.
The bug persisted with the new version of Boost (not 1.33.1 that has just been released but a version from the CVS from a few days ago, which is probably close to or the same as 1.33.1 anyway).
I tracked it down to the serialisation of an uninitialised boolean, to which Visual Studio .NET 7.1 had given the garbage value 205. The serialization library generates a stream error when this is read back into a boolean, naturally enough, because it is not a valid value for a boolean.
Is this uninitialized Boolean part of Boost's serialization code, or is it part of your class (called "MyClass" here)? The person responsible for this bug is the one who owns the code at fault. Did you check if MyClass's serialization works on a single object? Did you check if serialization works on non-XML archives, both single objects and your special vector?
My question is whether the serialization library should be made more robust to handle uninitialised variables. In the case of all other variables, of course, it is not easy or not possible to detect in a simple manner whether or not a variable is initialised: is that integer meant to have value 1.23456E+66 or is it just uninitialised? For booleans, though, a value of other than 0 or 1 means it has not been initialised, and perhaps this should throw an exception on writing to the archive rather than on reading from it.
Reading an uninitialized variable that isn't an "unsigned char" (or an array of such) is undefined behavior, no matter what. Such readings can (and have on some platforms) cause a trap/crash. It is not the responsibility of the serialization routines to muddle through uninitialized variables. It's impossible anyway: 1. How is the serialization code supposed to know if the variable is uninitialized? If your code has a flag indicating whether or not the variable is initialized, just redirect your efforts into making the variable always initialized (and remove the flag)! 2. Even if detection was possible, what policy were you going to have if an uninitialized value was found? And how would you enforce it? 3. And how do you know that a "1.23456E+66" integer value is illegal value? By reading it (and causing the problems in [1])? What about values of "0" or "1" from a Boolean? You made really big and really bad assumption there; the truth is that the bit-level format of a "bool" object is COMPLETELY UNSPECIFIED! It doesn't have to use an no-set-bits pattern for False or a simple single-set-bit (or all-set-bits) pattern for True. One or both of the two valid states could have multiple bit patterns allowed. 3a. In fact, I made a shocking realization researching for this thread. The standard describes "bool" in section 3.9.1, paragraph 6. Unlike "wchar_t" (described in paragraph 5), the "bool" does not have to be a rip-off of another built-in integral type! It can a distinct integral type from "char," "short," "int," or "long." Or it doesn't have to be a true integral type at all; it could just ACT like one when focused through the compiler! (Is this realization a bug? Should I report it on "comp.std.c++"?) 4. If you're using some sort of (discrimated) union, where only some of the members are valid at any time, then you must serialize only the currently active ones. It is IRRESPONSIBLE for you to do otherwise!
If handling uninitialised variables is not practical, then perhaps there could be a warning in the documentation that uninitialised booleans will cause stream errors on deserialisation.
Such handling isn't just impractical, it's impossible. It can NEVER be used as a legitimate programming technique. Why should we waste text describing any/every potential illegal program? It's also _not_ a problem specific to serialization; it's a general programming restriction that all of us should know already. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle Walker wrote:
3a. In fact, I made a shocking realization researching for this thread. The standard describes "bool" in section 3.9.1, paragraph 6. Unlike "wchar_t" (described in paragraph 5), the "bool" does not have to be a rip-off of another built-in integral type!
I would be shocked to discover that wchar_t does have to be a rip-off of another built in integral type. The only compiler I test on that doesn't have wchar_t as an intrinsic type is VC 6.0. Should we be altering jam toolset to besure tha wchar_t is a synonnym for something else? Robert Ramey
On 11/13/05 2:05 PM, "Robert Ramey"
Daryle Walker wrote:
3a. In fact, I made a shocking realization researching for this thread. The standard describes "bool" in section 3.9.1, paragraph 6. Unlike "wchar_t" (described in paragraph 5), the "bool" does not have to be a rip-off of another built-in integral type!
I would be shocked to discover that wchar_t does have to be a rip-off of another built in integral type. The only compiler I test on that doesn't have wchar_t as an intrinsic type is VC 6.0. Should we be altering jam toolset to besure tha wchar_t is a synonnym for something else?
I do _not_ mean that "wchar_t" is a "typedef"! It is considered a separate type. It is implemented like a "strong type-alias"[1] of another built-in integral type, just like "char" is a strong type-alias of either "signed char" and "unsigned char". The "bool" type could be of an underlying built-in type that is otherwise unreachable in code, and that type doesn't even have to be integral, just act that way through the compiler. [1] C and C++ don't have this feature in general, although some wish for it. Other languages do have this. I think Ada is an example. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle Walker wrote:
On 11/13/05 2:05 PM, "Robert Ramey"
wrote: Daryle Walker wrote:
[1] C and C++ don't have this feature in general, although some wish for it. Other languages do have this. I think Ada is an example.
As an aside, note that the serialization library contains STRONG_TYPE which was needed to implement the library. Its good enough for the serialization library but probably not up to "industrial strength". May some enterprising individual might want to take a look at this with the idea of make an "industrial strengh boost version" Robert Ramey
"Robert Ramey"
Daryle Walker wrote:
On 11/13/05 2:05 PM, "Robert Ramey"
wrote: Daryle Walker wrote:
[1] C and C++ don't have this feature in general, although some wish for it. Other languages do have this. I think Ada is an example.
As an aside, note that the serialization library contains STRONG_TYPE which was needed to implement the library. Its good enough for the serialization library but probably not up to "industrial strength". May some enterprising individual might want to take a look at this with the idea of make an "industrial strengh boost version"
There is no way to make such a thing "industrial strength" in today's C++, for most people's definition of "strong typedef". In fact, the one in the serialization library isn't anything like a typedef, and it's nothing like what most people mean when they say "strong typedef". It's merely a wrapper over an instance of some type that can be implicitly converted to that instance. A "strong typedef" wouldn't even necessarily have that implicit conversion -- in fact, eliminating those implicit conversions is one of the main reason some people want direct language support for strong typedefs. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
"Robert Ramey"
writes:
As an aside, note that the serialization library contains STRONG_TYPE which was needed to implement the library. Its good enough for the serialization library but probably not up to "industrial strength". May some enterprising individual might want to take a look at this with the idea of make an "industrial strengh boost version"
There is no way to make such a thing "industrial strength" in today's C++, for most people's definition of "strong typedef". In fact, the one in the serialization library isn't anything like a typedef, and it's nothing like what most people mean when they say "strong typedef". It's merely a wrapper over an instance of some type that can be implicitly converted to that instance. A "strong typedef" wouldn't even necessarily have that implicit conversion -- in fact, eliminating those implicit conversions is one of the main reason some people want direct language support for strong typedefs.
What I needed was a type that had the functioning of say an unsigned integer but was of a distinguishable type. What do other people mean when the say they want a strong typedef? Robert Ramey
"Robert Ramey"
What I needed was a type that had the functioning of say an unsigned integer but was of a distinguishable type. What do other people mean when the say they want a strong typedef?
That's roughly what they mean. But IIRC you actually needed much less than that, and so didn't even begin to attempt to implement all of it. For example, IIRC you can't add two instances of your type. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams wrote:
"Robert Ramey"
writes: What I needed was a type that had the functioning of say an unsigned integer but was of a distinguishable type. What do other people mean when the say they want a strong typedef?
That's roughly what they mean. But IIRC you actually needed much less than that, and so didn't even begin to attempt to implement all of it. For example, IIRC you can't add two instances of your type.
The following program does compile:
David Abrahams wrote:
"Robert Ramey"
writes: What I needed was a type that had the functioning of say an unsigned integer but was of a distinguishable type. What do other people mean when the say they want a strong typedef?
That's roughly what they mean. But IIRC you actually needed much less than that, and so didn't even begin to attempt to implement all of it. For example, IIRC you can't add two instances of your type.
The following program DOES compile. Your probably right that I
needed less. I remember I tried to make something of wider
applicability than the serialization library. Actually, its easier
on my brain to think in terms of something more general
than think in terms of the more narrow needs of a particular
application. Its intriguing to me that no one has made a real Boost
version of something like this as I need it from time to time.
Robert Ramey
#include
Daryle Walker wrote:
4. If you're using some sort of (discrimated) union, where only some of the members are valid at any time, then you must serialize only the currently active ones. It is IRRESPONSIBLE for you to do otherwise!
As I wrote that particular bit, I'll take the heat for it ... its not really a discriminated union, I used that as a loose desciption for what is actually occuring, So that's not Paul's fault :-) I think the only thing that everybody agrees on here is uninitialised variables are bad... I'm more interested in knowing I could do the following... lets say I archive an object using binary archives ... everything goes back and forth no problems, switching to text archives I'd expect the same behaviour more or less, so ideally if the value of my unitialised int happens to be 42, then when I get it back it should still be 42 when its read back in, this is fine, but if the value goes outside of the range say when its not an int but a bool, then we have a problem. The simple 1 line assert appears to detect the condition that occurs on read back, which is that the serialization library cannot read back things it *successfully wrote* as far as the calling code is able to detect. It chooses to write a bool in the form of "0" or "1", so if it writes "205" (the value we actually found when we inspected our text archive) we could see it wasn't a valid value and quickly fixed our code's problem, it would have been nicer, in this case, if could have spotted that the archive althought appearing to have been written without error, was in fact 'corrupt'. My approach to this would be a data paranoia mode which verifies the conversion will convert back on loading that doesn't trigger an assert, but throws an exception, but clearly thats more work than an assert in this case. Which ever way we no longer have the problem Kevin - I am not a number, I am a free NaN... -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |
On 11/14/05 7:59 AM, "Kevin Wheatley"
Daryle Walker wrote:
4. If you're using some sort of (discrimated) union, where only some of the members are valid at any time, then you must serialize only the currently active ones. It is IRRESPONSIBLE for you to do otherwise!
As I wrote that particular bit, I'll take the heat for it ... its not really a discriminated union, I used that as a loose desciption for what is actually occuring, So that's not Paul's fault :-)
I think the only thing that everybody agrees on here is uninitialised variables are bad...
Yes, reading them is bad. But advice applies even outside of unions. If the object will always be used then always initialize it. If it's only used sometimes, then you need a flag indicating its activity and skip using that object when it's inactive (or initialize and activate it).
I'm more interested in knowing I could do the following...
lets say I archive an object using binary archives ... everything goes back and forth no problems, switching to text archives I'd expect the same behaviour more or less, so ideally if the value of my unitialised int happens to be 42, then when I get it back it should still be 42 when its read back in, this is fine, but if the value goes outside of the range say when its not an int but a bool, then we have a problem. The simple 1 line assert appears to detect the condition that occurs on read back, which is that the serialization library cannot read back things it *successfully wrote* as far as the calling code is able to detect.
It was NEVER successfully written, since the object was never initialized when it was read. Undefined behavior means that anything could happen. The fact that you can "get away with it" with some combinations of input does not excuse the programming technique. Your difference between input and output shows that you didn't even get away with it. (And using an "int" isn't safe; any type besides "unsigned char" may have illegitimate bit patterns that could trap.) Are you just practicing, or is there a reason why you can't apply my "initialize or skip" advice?
It chooses to write a bool in the form of "0" or "1", so if it writes "205" (the value we actually found when we inspected our text archive) we could see it wasn't a valid value and quickly fixed our code's problem, it would have been nicer, in this case, if could have spotted that the archive althought appearing to have been written without error, was in fact 'corrupt'. My approach to this would be a data paranoia mode which verifies the conversion will convert back on loading that doesn't trigger an assert, but throws an exception, but clearly thats more work than an assert in this case.
Which ever way we no longer have the problem
Not if you just quick-patched the symptom and not the cause. You could have just turned your 100% bug into a Schroedinbug or Heisenbug. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle Walker wrote:
On 11/14/05 7:59 AM, "Kevin Wheatley"
wrote: I'm more interested in knowing I could do the following...
lets say I archive an object using binary archives ... everything goes back and forth no problems, switching to text archives I'd expect the same behaviour more or less, so ideally if the value of my unitialised int happens to be 42, then when I get it back it should still be 42 when its read back in, this is fine, but if the value goes outside of the range say when its not an int but a bool, then we have a problem. The simple 1 line assert appears to detect the condition that occurs on read back, which is that the serialization library cannot read back things it *successfully wrote* as far as the calling code is able to detect.
It was NEVER successfully written, since the object was never initialized when it was read. Undefined behavior means that anything could happen. The fact that you can "get away with it" with some combinations of input does not excuse the programming technique. Your difference between input and output shows that you didn't even get away with it. (And using an "int" isn't safe; any type besides "unsigned char" may have illegitimate bit patterns that could trap.) Are you just practicing, or is there a reason why you can't apply my "initialize or skip" advice?
Daryle, you seem to be missing the point... Kevin and I have long since fixed this bug and have no interest in serialising uninitialised variables. All that is required is a note in the docs under "stream error" saying that one thing that can cause this exception is an uninitialised variable, which allows the user to go and fix the problem quickly. No user in their right mind would want to serialise uninitialised variables (leaving aside quiet NaNs). *Nothing more is required.* Paul
Paul Giaccone
Daryle, you seem to be missing the point... Kevin and I have long since fixed this bug and have no interest in serialising uninitialised variables. All that is required is a note in the docs under "stream error" saying that one thing that can cause this exception is an uninitialised variable
That should be an assertion.
which allows the user to go and fix the problem quickly. No user in their right mind would want to serialise uninitialised variables (leaving aside quiet NaNs).
A quiet NaN is no more likely to be the result of an uninitialised float or double than is a float or double with any other legal value. Uninitialised floating types are normally full of random garbage just like anything else. -- Dave Abrahams Boost Consulting www.boost-consulting.com
Daryle Walker wrote:
Which ever way we no longer have the problem
Not if you just quick-patched the symptom and not the cause. You could have just turned your 100% bug into a Schroedinbug or Heisenbug.
no, we added the test, fixed the source, etc... then the email was sent (it wasn't supposed to generate this level of discussion :-) Like I said, in the previous mail, its not worth it ... it was only an idea ... nothing more to see here ... move along... Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |
Hello! I'm trying to use Boost.Thread in my application. I created my class with overloaded operator ( ) and pass it's object to the thread constructor. Thread runs, it's ok. But as I understand, when I pass my object to the thread constructor, thread makes a copy of it. So when I try to send a message (call a method) to MY object, running thread doesn't receive it, because it is a COPY. I solved this problem by giving to the thread object a pointer to a parent object (some kind of callback), but it's not quite suitable. How can I access the copy of my object, which is used by Boost.Thread? Is it possible or giving a pointer to parent object is the only way? Will be grateful for your assistance. Best regards, Denis.
Bondarenko Denis wrote:
Hello! I'm trying to use Boost.Thread in my application. I created my class with overloaded operator ( ) and pass it's object to the thread constructor. Thread runs, it's ok. But as I understand, when I pass my object to the thread constructor, thread makes a copy of it. So when I try to send a message (call a method) to MY object, running thread doesn't receive it, because it is a COPY. I solved this problem by giving to the thread object a pointer to a parent object (some kind of callback), but it's not quite suitable. How can I access the copy of my object, which is used by Boost.Thread? Is it possible or giving a pointer to parent object is the only way?
Boost.Threads doesn't expose the internal copy of the function object, but
you can use the following idiom to achieve a similar effect:
struct my_object
{
void run() { /* do threaded things */ }
void message() { /* presumably lock mutex and deliver a message */ }
};
int main()
{
boost::shared_ptr
Peter Dimov wrote:
Bondarenko Denis wrote:
Hello! I'm trying to use Boost.Thread in my application. I created my class with overloaded operator ( ) and pass it's object to the thread constructor. Thread runs, it's ok. But as I understand, when I pass my object to the thread constructor, thread makes a copy of it. So when I try to send a message (call a method) to MY object, running thread doesn't receive it, because it is a COPY. I solved this problem by giving to the thread object a pointer to a parent object (some kind of callback), but it's not quite suitable. How can I access the copy of my object, which is used by Boost.Thread? Is it possible or giving a pointer to parent object is the only way?
Boost.Threads doesn't expose the internal copy of the function object, but you can use the following idiom to achieve a similar effect:
struct my_object { void run() { /* do threaded things */ } void message() { /* presumably lock mutex and deliver a message */ } };
int main() { boost::shared_ptr
pm( new my_object ); boost::thread thr( boost::bind( &my_object::run, pm ) ); pm->message(); }
Hi, Peter! Thank you very much. It works good. It looks like I should learn more about Boost.Bind :) Best regards, Denis
participants (14)
-
Bondarenko Denis
-
Daryle Walker
-
David Abrahams
-
Edward Diener
-
gast128
-
Ian McCulloch
-
Kevin Wheatley
-
Nigel Rantor
-
Noel Yap
-
Paul Giaccone
-
Peter Dimov
-
Robert Ramey
-
Stephen Torri
-
troy d. straszheim