[variant2] never-empty guarantee considered harmful
Hi Everyone, I waned to re-iterate my concerns regarding the never-empty guarantee provided by variant2 (as well as by boost::variant). Sorry for starting a yet another thread, but I think it necessary to cleanse, group and structure the information scattered among a number of threads. I think that the never-empty variant deceives people into thinking that it offers something useful and important, but in fact it does not, and more, it can encourage people to trust that they have some guarantee which they haven't, and owing to this to write buggy code. The solution that never-empty variant offers is similar to another common idea of replacing undefined behavior with *any *defined behavior whatsoever, and that is supposed to be "safe" because programmers no longer see symptoms of their bugs. Let me digress on the subject of UB for a second. UB is always caused by a bug in the program. Thus if the program had no bugs, it would have no UB. But many people are very afraid of UB per se, and less afraid of bugs. They will divert much attention to prevent UB at surprisingly high cost, but focus less on preventing their bugs. UB is a useful symptom of a bug and therefore can help detect and remove bugs, but people are sometimes so afraid of UB that they loose this connection to the extent that they are willing to remove the link between the two and get rid of potential UB even at the cost of concealing the bugs. And they consider it "safe". For instance, someone writes a sequence container class and faces the problem. I will need to provide access to elements by index, but people will give me bad indexes (because they can), this would be a UB, and I do not want a UB, therefore I apply a "safety" feature. Whenever someone gives me bad index, I will return a reference to the first element. Sure, it is surprising, and likely not what the caller wanted, but it is better than UB. And in case I have no elements in my container I will default-construct one in the container. Sure it is not nice to to increase the size of the container only because someone gave me the wrong index, but... *anything is better than UB.* I wonder if everyone reading this mail will agree with me that reasoning "anything is better than UB" is wrong. The author of this "safe" container calls his container "safe" because whatever bugs his callers have, they will never trigger UB *in his library*. Maybe in another library, at another level, but not in his component. In the end we will get programs that will cause self-driving cars to crash and kill people, space rockets to explode, but the "safe" program itself will never crash. Instead the author of the "safe" container should be focusing on something else. Why is my caller giving me the wrong index given that he can check my size and easily determine that this index is wrong? The answer is, the programmer probably intended to do something else and this needs to be brought to his attention. Tools like compilers and static analyzers cannot detect bugs in general because they cannot guess our intentions, but they can detect things like UB, so if we keep a correspondence between bugs and UB, we can detect bugs by detecting UB. This has been described in more detail in this paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1043r0.html (section 6). Now, the situation is similar with the never-empty guarantee of the variant that offers only a basic exception safety guarantee on assignment. Because the guarantee is basic (that is, it is not the strong guarantee), the only thing we can do with it in order not to risk program bugs is, following the Dave's article (https://www.boost.org/community/exception_safety.html), to either destroy or reset the variant that threw from assignment. And this is what correct programs do, and the technique is quite simple: unless objects provide strong exception safety guarantee (or some other guarantee that allows you to reason about the values of objects that threw) they should die in the stack unwinding process. The damage that the never-empty guarantee (implemented by setting unexpected values) does is to encourage people to think that with variant they can do something more than just destroy or reset. And while technically, variant itself is "clean" in this case (it really doesn't invoke UB itself) it causes the code that is using it to be incorrect. Two examples. I may have the following instance: ``` variant2<X, Y, shared_ptr<Z>> v {X{"params"}}; struct Visitor { int operator()(shared_ptr<Z> const& z) { return z->getCount(); } int operator()(X const&) { return 1; } int operator()(Y const&) { return 1; } }; ``` Where X and Y have nasty properties (are declared to potentially throw from move constructor and do not have default constructors). My v could technically store a null pointer at some point. In the visitor I do not check for this eventuality. Someone could say, "you should always check for null pointer, because anything is better than UB", But I know I never will be storing a null pointer there. I use shared_ptr for its ability to share ownership of an object, not for its ability to be null. I guarantee that I will never set the shared_ptr<Z> alternative to nullptr value. Therefore in the visitor I can assume that the pointer is never null. But if I am encouraged by the variant's never-empty guarantee to depart from Dave's guidance not to trust basic-guarantee objects that threw, I may end up in the situation where I start with alternative X, I want to assign alternative Y, but it throws, and because the only type with the default constructor is shared_ptr<Z>, I will get a nullptr value which I never intended. This is not bad yet. But now, if I apply my visitor on my variant, my precondition that the pointer is not null will be violated: ``` variant2<X, Y, shared_ptr<Z>> v {X{"params"}}; BOOST_SCOPE_GUARD(&v) { visit(Visitor{}, v); }; v.emplace<Y>("params"); // throws ``` Now, variant2 fulfills its guarantee: it stores one of the alternatives (of arbitrary choice and arbitrary value), but it injected me the worst possible value of the type with the weak invariant. Thus, variant2's invariant may be strong, but it exploits the weak invariants of the types of the alternatives. And unfortunately, a lot of types, even in the STD, have weak invariants and use the default constructor to set the degenerate state. Thus, my program has UB, and hopefully a crash, because I was deceived by the never-empty guarantee: I thought it would be "safe" to observe such variant that threw. Second example comes from Peter: ``` template<class T> class monitor { T const& v_; T old_; public: explicit monitor(T const& v) : v_(v), old_(v) {} ~monitor() { if( v_ != old_ ) std::clog << "value changed"; } }; ``` Suppose I use it like this: ``` { variant2<X, Y, shared_ptr<Z>> v {shared_ptr<Z>{}}; // let me check if this code sets the v to a non-degenerate value: monitor m {v}; v.emplace<X>("params"); // succeeds v.emplace<Y>("param"); // fails: null pointer set. } // no value change reported ``` although the only state changes I initiated in this block were to set types X and Y, I got the wrong result from the monitor, because the "safe" guarantee of the variant chose some state that corrupted the monitoring logic. variant2 delivers its guarantee as promised: some value is there, but my monitoring is compromised because of its arbitrary choice of value. Note that std::variant does not have this problem: a distinct state is set upon such exception that is always different than any value of any of the alternatives. Of course, this second example has problems of its own. It assumes you can invoke operator== on an arbitrary T of which we know nothing. It assumes that T's invariant allows the values to be compared, but this is not correct an assumption. Type T can have a weak invariant: one that does not even allow comparing two values. You cannot just invoke arbitrary operations on an unknown T in the destructor, if you do not know they exception safety guarantees. T's comparison could potentially throw. Even destructors of generic components such as vector<T> or optional<T> in their non-trivial implementations only invoke operations on concrete types: pointers and bools, but for the T they only call destructors. Peter argues that a call to the operation on a variant could be invoked indirectly form the destructor: destructor calls function f(), function f calls function g(), function g() calls function f() and function h() makes a visitation on the variant. But can you see what this means? I am calling some function f() from the destructor, which is supposed to clean up resources, and I am not sure what f() does, if it does some hidden things beyond my control. This is a buggy code already. Now, to address another argument in favor of never-empty, arbitrary-value-choosing guarantee. Emil says that with never empty-guarantee we can more easily reason about the variant: it is *always* one of the alternatives: rather that "either one of the alternatives or some degenerate state. He argues that once we allow the faintest possibility for the degenerate state, we have to check for it *everywhere*. I think that the simple binary model, "either the weak state does not exist or is potentially everywhere", while it often works for many types is, not adequate to describe the reality of the variant. People need models (which are simplifications of reality) to help them manage complex reality. But models have the downside of not allowing people to see the reality beyond what the models allows them to see. We ave to distinguish "ordinary" weak state and "unusual circumstance" weak state. Types like unique_ptr or fstream have ordinary weak state: it is created by a constructor. Worse, by the default constructor, so if you just declare the variable name and forget about parameters you immediately get the weak state. It is easy to obtain this state. And therefore in many places we need to be prepare to handle them. But other types (which we have fewer in STD: only std::variant, I think) can have only "unusual circumstance" weak state: you cannot initialize the object to contain this state (unless you move/copy construct from another object that already has one). You can only get it in situations that for other reasons require special caution from programmers. I am aware of two: * when an object is manually moved from (by explicitly casting an lvalue to an rvalue reference, which std::move() does) * when a mutable operation on an object with basic exception safety guarantee threw an exception. In the first situation, in a generic context where we work with an unknown type T, after the move an object is in a valid but unspecified state. Type T is allowed to have a very weak invariant: so weak that even operator== may not be allowed to be invoked. In this state the only correct thing we can do is to either let the object die or reset its value (the only common way for resetting the value for different types is assignment of a non-degenerate value). Working under the assumption that programmers are aware of these constraints and are willing to adhere to them, we get confidence that these moved-from values will not be observed. (Of course this does not apply to types which choose to guarantee more after move.) We can design a resource-handling type, such as a socket, whose default constructor acquires the first available port, and leaves the object in a "strong" state, ready to work. there is no way to initialize objects of this type to a degenerate value. but if these sockets are move-constructible, the moved from object will have to store the degenerate value. But it would be pointless to check it anywhere in the program. We could only see it if programmers did not adhere to the rule, "only assign to or destroy the moved from objects, unless you know your object offers more guarantees". If programmers do not adhere to this rule, rather than covering up for their bugs, socket should make the UB as explicit as possible in order to assist static analyzers and tools in detecting such programmer's misuse of the moved-from state. Similarly with the after-throw state from the basic guarantee operation. The language cannot enforce this, but programmers are required to adhere to the rule: the only correct operations to be invoked in that case is to destroy the object or to reset it. again, if programmers do other things this is a bug and the type should try to reflect that in a controlled UB, rather than pretending that everything is fine and causing bugs later in unpredictable places, as was shown in the examples above. Insisting that unusual-circumstance weak state should be checked *everywhere* is depriving oneself of the ability to see the engineering trade-offs clearly. We do not have to distinguish "ordinary" and "unusual-circumstance" weak states in most of other types (nearly all types) because they do not pose so many implementation difficulties as variant does: implementing never empty guarantee costs too much in terms of run-time or spatial efficiency. For the sake of the simplicity of the model we cannot sacrifice too much efficiency. Given that efficient implementation of variant does not fit into the simple model, "either I have a strong invariant or not", one option is to provide a misleading and/or never-empty guarantee. But another option is to acknowledge that the simple model is no longer sufficient, and adapt a new one: more complex but one that better reflects the reality: we have two invariants: one that applies in normal code (where every function can be freely called and a weaker one that applies in circumstances that already require special caution from programmers. This is the C++ way: we deal with UB which some other languages do not have at all:this is more complex a task but allows for faster binaries and better bug detection. In a similar vein paper "Fixing Relations" ( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1248r0.html) has been recently adopted in the C++ Standards Committee. The proposal compromised a nice and simple mathematical model, but this part of the model stood in a way to use some STL algorithms efficiently. C++ is not an easy language, and by trying to make it appear simpler than it is, we might only confuse the new programmers. One more thing, it is not the surprising value that variant2 may set upon the exception that bothers me the most. It is rather that indirectly it encourages programmers to think that it is fine or "safe" to read the value of the object that has been moved from or that threw from the basic-guarantee operation. We do not want to suggest that. Finally, all the above reasoning would not apply if variant offered a strong guarantee on assignment/emplacement for cases where the alternative needs to be changed. In that case: 1. It is perfectly fine to observe the state of the object that threw. 2. No surprising, potentially weak-state, value is ever fabricated in this case. Regards, &rzej;
On Fri, Apr 12, 2019 at 7:26 PM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:
UB is a useful symptom of a bug and therefore can help detect and remove bugs
It seems like you are using a different definition of undefined behavior than what I understand it to be. UB is not a useful symptom because it can be anything, and can vary by implementation. It doesn't necessarily mean the program crashes, it could for example result in exactly the harmless unnoticed response to a bug you object to.
On 13.04.19 01:47, Frank Mori Hess via Boost wrote:
On Fri, Apr 12, 2019 at 7:26 PM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:
UB is a useful symptom of a bug and therefore can help detect and remove bugs
It seems like you are using a different definition of undefined behavior than what I understand it to be. UB is not a useful symptom because it can be anything, and can vary by implementation. It doesn't necessarily mean the program crashes, it could for example result in exactly the harmless unnoticed response to a bug you object to.
UB is not a runtime check that detects bugs (although compiling with undefined behavior sanitizer turns it into one). It is a conceptual tool for verifying the correctness of your program. If your program invokes undefined behavior, it is incorrect. Conversely, if your program is correct, then it does not invoke undefined behavior. Undefined behavior is not a defect of the C++ language. It's a deliberate feature. The standards committee could have easily defined the result of reading an uninitialized variable as "whatever arbitrary value happens to occupy that memory location". Instead, they chose to mark it as undefined behavior, because a program that uses uninitialized variables is /wrong/. This bears repeating. Reading from an uninitialized variable is not wrong because it is undefined behavior, but the other way around. Reading from an uninitialized variable is undefined behavior because it is wrong. -- Rainer Deyke (rainerd@eldwood.com)
While UB sanitizers and other tools like valgrind are incredibly powerful and useful, a test suite is and always will be more portable, reliable and self-documenting. Emphasizing UB for the sake of debugging or ensuring correctness is not as effective as emphasizing test suites instead. A strong guarantee gives sufficient reasoning and logical guarantees for programmers. - Chris On Fri, Apr 12, 2019 at 11:06 PM Rainer Deyke via Boost < boost@lists.boost.org> wrote:
On 13.04.19 01:47, Frank Mori Hess via Boost wrote:
On Fri, Apr 12, 2019 at 7:26 PM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:
UB is a useful symptom of a bug and therefore can help detect and remove bugs
It seems like you are using a different definition of undefined behavior than what I understand it to be. UB is not a useful symptom because it can be anything, and can vary by implementation. It doesn't necessarily mean the program crashes, it could for example result in exactly the harmless unnoticed response to a bug you object to.
UB is not a runtime check that detects bugs (although compiling with undefined behavior sanitizer turns it into one). It is a conceptual tool for verifying the correctness of your program. If your program invokes undefined behavior, it is incorrect. Conversely, if your program is correct, then it does not invoke undefined behavior.
Undefined behavior is not a defect of the C++ language. It's a deliberate feature. The standards committee could have easily defined the result of reading an uninitialized variable as "whatever arbitrary value happens to occupy that memory location". Instead, they chose to mark it as undefined behavior, because a program that uses uninitialized variables is /wrong/.
This bears repeating. Reading from an uninitialized variable is not wrong because it is undefined behavior, but the other way around. Reading from an uninitialized variable is undefined behavior because it is wrong.
-- Rainer Deyke (rainerd@eldwood.com)
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
sob., 13 kwi 2019 o 21:33 Christian Mazakas via Boost <boost@lists.boost.org> napisał(a):
While UB sanitizers and other tools like valgrind are incredibly powerful and useful, a test suite is and always will be more portable, reliable and self-documenting. Emphasizing UB for the sake of debugging or ensuring correctness is not as effective as emphasizing test suites instead.
A strong guarantee gives sufficient reasoning and logical guarantees for programmers.
Let me offer my perspective on this. When I say "UB helping tools detect bugs" I do not primarily mean UB sanitizers or Valgrind, or running the program inside the debugger , which detect bugs while the program is running. I mean static analysis tools which do not require the program to be run, such as clang's static analyzer: they just examine the program's abstract syntax tree and it s through this analysis that bugs are detected. There is no need to compare the effectiveness of static analysis against the test suite: the two are complementary and should be both exercised on a program to ensure correctness. Also, in test suites, UB is helpful. When pieces of the program need to be run, you can compile with UB-sanitizer enabled and your test suite can detect all UB-s in a uniform way. And each UB is a symptom of a bug. UB sanitizers are available practically on any compiler ever since constexpr has been added to C++. During the compile-time evaluation of constexpr function every UB is required to be detected and reported as compile time error. Since compiler vendors are required to implement it for constant evaluation, providing it later also for run-time evaluation costs them practically nothing. Regards, &rzej;
- Chris
On Fri, Apr 12, 2019 at 11:06 PM Rainer Deyke via Boost < boost@lists.boost.org> wrote:
On 13.04.19 01:47, Frank Mori Hess via Boost wrote:
On Fri, Apr 12, 2019 at 7:26 PM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:
UB is a useful symptom of a bug and therefore can help detect and remove bugs
It seems like you are using a different definition of undefined behavior than what I understand it to be. UB is not a useful symptom because it can be anything, and can vary by implementation. It doesn't necessarily mean the program crashes, it could for example result in exactly the harmless unnoticed response to a bug you object to.
UB is not a runtime check that detects bugs (although compiling with undefined behavior sanitizer turns it into one). It is a conceptual tool for verifying the correctness of your program. If your program invokes undefined behavior, it is incorrect. Conversely, if your program is correct, then it does not invoke undefined behavior.
Undefined behavior is not a defect of the C++ language. It's a deliberate feature. The standards committee could have easily defined the result of reading an uninitialized variable as "whatever arbitrary value happens to occupy that memory location". Instead, they chose to mark it as undefined behavior, because a program that uses uninitialized variables is /wrong/.
This bears repeating. Reading from an uninitialized variable is not wrong because it is undefined behavior, but the other way around. Reading from an uninitialized variable is undefined behavior because it is wrong.
-- Rainer Deyke (rainerd@eldwood.com)
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
sob., 13 kwi 2019 o 08:06 Rainer Deyke via Boost <boost@lists.boost.org> napisał(a):
On 13.04.19 01:47, Frank Mori Hess via Boost wrote:
On Fri, Apr 12, 2019 at 7:26 PM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:
UB is a useful symptom of a bug and therefore can help detect and remove bugs
It seems like you are using a different definition of undefined behavior than what I understand it to be. UB is not a useful symptom because it can be anything, and can vary by implementation. It doesn't necessarily mean the program crashes, it could for example result in exactly the harmless unnoticed response to a bug you object to.
UB is not a runtime check that detects bugs (although compiling with undefined behavior sanitizer turns it into one). It is a conceptual tool for verifying the correctness of your program. If your program invokes undefined behavior, it is incorrect. Conversely, if your program is correct, then it does not invoke undefined behavior.
Exactly. UB is not a tool for detecting bugs *at run-time* (even though it is possible today). It is too late to detect bugs at run-time. It is for detecting the bugs before the program is run for the first time: at static analysis time, or code-review time. Regards, &rzej;
Undefined behavior is not a defect of the C++ language. It's a deliberate feature. The standards committee could have easily defined the result of reading an uninitialized variable as "whatever arbitrary value happens to occupy that memory location". Instead, they chose to mark it as undefined behavior, because a program that uses uninitialized variables is /wrong/.
This bears repeating. Reading from an uninitialized variable is not wrong because it is undefined behavior, but the other way around. Reading from an uninitialized variable is undefined behavior because it is wrong.
-- Rainer Deyke (rainerd@eldwood.com)
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (4)
-
Andrzej Krzemienski
-
Christian Mazakas
-
Frank Mori Hess
-
Rainer Deyke