[variant2] never-empty guarantee considered harmful

12 Apr 2019

      Hi Everyone,
I waned to re-iterate my concerns regarding the never-empty guarantee
provided by variant2 (as well as by boost::variant). Sorry for starting a
yet another thread, but I think it necessary to cleanse, group and
structure the information scattered among a number of threads.

I think that the never-empty variant deceives people into thinking that it
offers something useful and important, but in fact it does not, and more,
it can encourage people to trust that they have some guarantee which they
haven't, and owing to this to write buggy code.

The solution that never-empty variant offers is similar to another common
idea of replacing undefined behavior with *any *defined behavior
whatsoever, and that is supposed to be "safe" because programmers no longer
see symptoms of their bugs.

Let me digress on the subject of UB for a second. UB is always caused by a
bug in the program. Thus if the program had no bugs, it would have no UB.
But many people are very afraid of UB per se, and less afraid of bugs. They
will divert much attention to prevent UB at surprisingly high cost, but
focus less on preventing their bugs. UB is a useful symptom of a bug and
therefore can help detect and remove bugs, but people are sometimes so
afraid of UB that they loose this connection to the extent that they are
willing to remove the link between the two and get rid of potential UB even
at the cost of concealing the bugs. And they consider it "safe".

For instance, someone writes a sequence container class and faces the
problem. I will need to provide access to elements by index, but people
will give me bad indexes (because they can), this would be a UB, and I do
not want a UB, therefore I apply a "safety" feature. Whenever someone gives
me bad index, I will return a reference to the first element. Sure, it is
surprising, and likely not what the caller wanted, but it is better than
UB. And in case I have no elements in my container I will default-construct
one in the container. Sure it is not nice to to increase the size of the
container only because someone gave me the wrong index, but... *anything is
better than UB.*

I wonder if everyone reading this mail will agree with me that reasoning
"anything is better than UB" is wrong. The author of this "safe" container
calls his container "safe" because whatever bugs his callers have, they
will never trigger UB *in his library*. Maybe in another library, at
another level, but not in his component. In the end we will get programs
that will cause self-driving cars to crash and kill people, space rockets
to explode, but the "safe" program itself will never crash.

Instead the author of the "safe" container should be focusing on something
else. Why is my caller giving me the wrong index given that he can check my
size and easily determine that this index is wrong? The answer is, the
programmer probably intended to do something else and this needs to be
brought to his attention. Tools like compilers and static analyzers cannot
detect bugs in general because they cannot guess our intentions, but they
can detect things like UB, so if we keep a correspondence between bugs and
UB, we can detect bugs by detecting UB. This has been described in more
detail in this paper:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1043r0.html
(section 6).

Now, the situation is similar with the never-empty guarantee of the variant
that offers only a basic exception safety guarantee on assignment. Because
the guarantee is basic (that is, it is not the strong guarantee), the only
thing we can do with it in order not to risk program bugs is, following the
Dave's article (https://www.boost.org/community/exception_safety.html), to
either destroy or reset the variant that threw from assignment. And this is
what correct programs do, and the technique is quite simple: unless objects
provide strong exception safety guarantee (or some other guarantee that
allows you to reason about the values of objects that threw) they should
die in the stack unwinding process. The damage that the never-empty
guarantee (implemented by setting unexpected values) does is to encourage
people to think that with variant they can do something more than just
destroy or reset. And while technically, variant itself is "clean" in this
case (it really doesn't invoke UB itself) it causes the code that is using
it to be incorrect. Two examples.

I may have the following instance:

```
variant2<X, Y, shared_ptr<Z>> v {X{"params"}};

struct Visitor
{
  int operator()(shared_ptr<Z> const& z) { return z->getCount(); }
  int operator()(X const&) { return 1; }
  int operator()(Y const&) { return 1; }
};
```

Where X and Y have nasty properties (are declared to potentially throw from
move constructor and do not have default constructors).
My v could technically store a null pointer at some point. In the visitor I
do not check for this eventuality. Someone could say, "you should always
check for null pointer, because anything is better than UB", But I know I
never will be storing a null pointer there. I use shared_ptr for its
ability to share ownership of an object, not for its ability to be null. I
guarantee that I will never set the shared_ptr<Z> alternative to nullptr
value. Therefore in the visitor I can assume that the pointer is never null.

But if I am encouraged by the variant's never-empty guarantee to depart
from Dave's guidance not to trust basic-guarantee objects that threw, I may
end up in the situation where I start with alternative X, I want to assign
alternative Y, but it throws, and because the only type with the default
constructor is shared_ptr<Z>, I will get a nullptr value which I never
intended. This is not bad yet. But now, if I apply my visitor on my
variant, my precondition that the pointer is not null will be violated:

```
variant2<X, Y, shared_ptr<Z>> v {X{"params"}};
BOOST_SCOPE_GUARD(&v) { visit(Visitor{}, v); };
v.emplace<Y>("params"); // throws
```

Now, variant2 fulfills its guarantee: it stores one of the alternatives (of
arbitrary choice and arbitrary value), but it injected me the worst
possible value of the type with the weak invariant. Thus, variant2's
invariant may be strong, but it exploits the weak invariants of the types
of the alternatives. And unfortunately, a lot of types, even in the STD,
have weak invariants and use the default constructor to set the degenerate
state. Thus, my program has UB, and hopefully a crash, because I was
deceived by the never-empty guarantee: I thought it would be "safe" to
observe such variant that threw.

Second example comes from Peter:

```
template<class T> class monitor
{
  T const& v_;
  T old_;
public:
  explicit monitor(T const& v) : v_(v), old_(v) {}

  ~monitor() {
    if( v_ != old_ )
      std::clog << "value changed";
  }
};
```

Suppose I use it like this:

```
{
  variant2<X, Y, shared_ptr<Z>> v {shared_ptr<Z>{}};
  // let me check if this code sets the v to a non-degenerate value:
  monitor m {v};
  v.emplace<X>("params"); // succeeds
  v.emplace<Y>("param"); // fails: null pointer set.
} // no value change reported
```

although the only state changes I initiated in this block were to set types
X and Y, I got the wrong result from the monitor, because the "safe"
guarantee of the variant chose some state that corrupted the monitoring
logic. variant2 delivers its guarantee as promised: some value is there,
but my monitoring is compromised because of its arbitrary choice of value.
Note that std::variant does not have this problem: a distinct state is set
upon such exception that is always different than any value of any of the
alternatives.

Of course, this second example has problems of its own. It assumes you can
invoke operator== on an arbitrary T of which we know nothing. It assumes
that T's invariant allows the values to be compared, but this is not
correct an assumption. Type T can have a weak invariant: one that does not
even allow comparing two values. You cannot just invoke arbitrary
operations on an unknown T in the destructor, if you do not know they
exception safety guarantees. T's comparison could potentially throw. Even
destructors of generic components such as vector<T> or optional<T> in their
non-trivial implementations only invoke operations on concrete types:
pointers and bools, but for the T they only call destructors.

Peter argues that a call to the operation on a variant could be invoked
indirectly form the destructor: destructor calls function f(), function f
calls function g(), function g() calls function f() and function h() makes
a visitation on the variant. But can you see what this means? I am calling
some function f() from the destructor, which is supposed to clean up
resources, and I am not sure what f() does, if it does some hidden things
beyond my control. This is a buggy code already.

Now, to address another argument in favor of never-empty,
arbitrary-value-choosing guarantee. Emil says that with never
empty-guarantee we can more easily reason about the variant: it is *always*
one of the alternatives: rather that "either one of the alternatives or
some degenerate state. He argues that once we allow the faintest
possibility for the degenerate state, we have to check for it *everywhere*.

I think that the simple binary model, "either the weak state does not exist
or is potentially everywhere", while it often works for many types is, not
adequate to describe the reality of the variant. People need models (which
are simplifications of reality) to help them manage complex reality. But
models have the downside of not allowing people to see the reality beyond
what the models allows them to see.

We ave to distinguish "ordinary" weak state and "unusual circumstance" weak
state. Types like unique_ptr or fstream have ordinary weak state: it is
created by a constructor. Worse, by the default constructor, so if you just
declare the variable name and forget about parameters you immediately get
the weak state. It is easy to obtain this state. And therefore in many
places we need to be prepare to handle them.

But other types (which we have fewer in STD: only std::variant, I think)
can have only "unusual circumstance" weak state: you cannot initialize the
object to contain this state (unless you move/copy construct from another
object that already has one). You can only get it in situations that for
other reasons require special caution from programmers. I am aware of two:
* when an object is manually moved from (by explicitly casting an lvalue to
an rvalue reference, which std::move() does)
* when a mutable operation on an object with basic exception safety
guarantee threw an exception.

In the first situation, in a generic context where we work with an unknown
type T, after the move an object is in a valid but unspecified state. Type
T is allowed to have a very weak invariant: so weak that even operator==
may not be allowed to be invoked. In this state the only correct thing we
can do is to either let the object die or reset its value (the only common
way for resetting the value for different types is assignment of a
non-degenerate value). Working under the assumption that programmers are
aware of these constraints and are willing to adhere to them, we get
confidence that these moved-from values will not be observed. (Of course
this does not apply to types which choose to guarantee more after move.)

We can design a resource-handling type, such as a socket, whose default
constructor acquires the first available port, and leaves the object in a
"strong" state, ready to work. there is no way to initialize objects of
this type to a degenerate value. but if these sockets are
move-constructible, the moved from object will have to store the degenerate
value. But it would be pointless to check it anywhere in the program. We
could only see it if programmers did not adhere to the rule, "only assign
to or destroy the moved from objects, unless you know your object offers
more guarantees". If programmers do not adhere to this rule, rather than
covering up for their bugs, socket should make the UB as explicit as
possible in order to assist static analyzers and tools in detecting such
programmer's misuse of the moved-from state.

Similarly with the after-throw state from the basic guarantee operation.
The language cannot enforce this, but programmers are required to adhere to
the rule: the only correct operations to be invoked in that case is to
destroy the object or to reset it. again, if programmers do other things
this is a bug and the type should try to reflect that in a controlled UB,
rather than pretending that everything is fine and causing bugs later in
unpredictable places, as was shown in the examples above.

Insisting that unusual-circumstance weak state should be checked
*everywhere* is depriving oneself of the ability to see the engineering
trade-offs clearly. We do not have to distinguish "ordinary" and
"unusual-circumstance" weak states in most of other types (nearly all
types) because they do not pose so many implementation difficulties as
variant does: implementing never empty guarantee costs too much in terms of
run-time or spatial efficiency. For the sake of the simplicity of the model
we cannot sacrifice too much efficiency.

Given that efficient implementation of variant does not fit into the simple
model, "either I have a strong invariant or not", one option is to provide
a misleading and/or never-empty guarantee. But another option is to
acknowledge that the simple model is no longer sufficient, and adapt a new
one: more complex but one that better reflects the reality: we have two
invariants: one that applies in normal code (where every function can be
freely called and a weaker one that applies in circumstances that already
require special caution from programmers. This is the C++ way: we deal with
UB which some other languages do not have at all:this is more complex a
task but allows for faster binaries and better bug detection. In a similar
vein paper "Fixing Relations" (
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1248r0.html)  has
been recently adopted in the C++ Standards Committee. The proposal
compromised a nice and simple mathematical model, but this part of the
model stood in a way to use some STL algorithms efficiently.

C++ is not an easy language, and by trying to make it appear simpler than
it is, we might only confuse the new programmers.

One more thing, it is not the surprising value that variant2 may set upon
the exception that bothers me the most. It is rather that indirectly it
encourages programmers to think that it is fine or "safe" to read the value
of the object that has been moved from or that threw from the
basic-guarantee operation. We do not want to suggest that.

Finally, all the above reasoning would not apply if variant offered a
strong guarantee on assignment/emplacement for cases where the alternative
needs to be changed. In that case:
1. It is perfectly fine to observe the state of the object that threw.
2. No surprising, potentially weak-state, value is ever fabricated in this
case.

Regards,
&rzej;

Andrzej Krzemienski

Frank Mori Hess

Rainer Deyke

Christian Mazakas

Andrzej Krzemienski

Andrzej Krzemienski

tags

participants (4)