Re: [boost] New Library Proposal: dual_state

10 Jul 2005

      ...
From: boost-bounces@lists.boost.org
[mailto:boost-bounces@lists.boost.org] On Behalf Of Eelis van der Weegen
...
...
I am curious if there is support for what I'm calling a "dual_state"
template class.
From your description it sounds a lot like Boost.Optional. What are
Jost, Andrew wrote:
the main differences?
Eelis
I'll admit I did not even pause at Boost.optional when I scanned the
library listing for previous work, a failure in my ability to connect
the description, "Discriminated-union wrapper for optional values," with
the concept I had in mind.  I'm not sure if the problem is in my
unfamiliarity with the subject, the description itself, or the concept's
general ineffability.  In any case, I'll contrast Boost.optional with
dual_state to see where the differences lie.

But first, let's find a common starting point.  The Boost.optional
documentation succinctly states the purpose of that library:

	"optional<T> intends to formalize the notion of
initialization/no-initialization allowing a program to test whether an
object has been initialized"

That document also discusses various solutions to the general problem of
handling "signal" values, including cases where the cumbersome pair<T,
bool> construct is necessary because no sensible signal can be defined.
I would say that dual_state is intended to tackle the same set of
problems.  However, I see three important differences.

*GUARANTEED OBJECT DELIVERY*
First and foremost, dual_state is guaranteed to always deliver a valid
object (or reference), even if this object (or reference) must be
conjured from nowhere.  This is in direct contrast to Boost.optional,
which maintains that

	"access to the value of an uninitialized object is **undefined
behaviour**"

The above quote is also from the Boost.optional documentation (emphasis
mine).  Boost.optional and dual_state appear to address the same
problems, but with two fundamentally distinct philosophies.  Consider a
dual_state<string>.  In the following code, the char* in the final
expression is guaranteed to be valid, so the programmer needn't even
bother to check it before calling strlen:

// -- begin
typedef dual_state<std::string> state_string;
state_string str;
int len;
...
len = strlen( str->c_str() ); // always a valid char*
// -- end

Note that operator-> is used to access a base class' member.  The
analogous code using Boost.optional is as follows:

// -- begin
typedef boost::optional<std::string> opt_string;
opt_string str;
int len;

len = strlen( str.get().c_str() ); // UNDEFINED!

// the correct way
if( !str ) {
    // do something else <== HERE
} else {
    len = strlen( str.get().c_str() ); // okay now
}
// -- end

A programmer may view dual_state's "guaranteed delivery" behavior as
either advantage or liability, but his opinion, if tenable, must focus
on the relative benefits afforded by guaranteed objects versus the risks
posed by -- one might say -- "junk" default values.  I would say it
depends on the application.  More to the point, if the answer to what
goes "<== HERE" is always "set len = 0" for a particular application,
then dual_state could offer greater clarity with less work.

*FULL OPERATOR SUPPORT*
A second difference is that dual_state directly supports the full
complement of operators (for built-in types), this due to the fact that
most operators are delegated through the implicit
conversion-to-base-type operator.  Exceptions to implicit delegation are
those operators that modify the dual_state object itself.  The example
below demonstrates two operators, + and +=.  Addition operates via the
conversion to T (not T&) operator and therefore returns a regular T
object, at the cost of one extra copy (a copy, incidentally, that can be
avoided by using the value() member).  The concerted += operator, which
is defined by dual_state, avoids copies and evaluates to a dual_state
reference:

// -- begin
typedef dual_state<int> state_int;
state_int x;
x + 5;  // evaluates to (int); x still undefined
x += 5;  // evaluates to (state_int&); x is defined, equal to 5
x.value() + 5; // also evaluates to (int)

typedef boost::optional<int> opt_int;
opt_int y;
y + 5;  // won't compile
y += 5;  // nor will this

y.get() + 5; // evaluates to (int)
// -- end

Again, the question of which behavior is most desirable hinges on the
programmer's needs.  While Boost.optional and dual_state both give the
programmer a way manage defined and undefined objects; a way to manage
initialized and uninitialized data; a way to sidestep "signal" values:
EOF, -1, n_pos -- the semantics of one approach or the other may be
better suited for a given application.

*THE UNDEFINED_OBJECT*
A third difference between Boost.optional and dual_state involves the
very personal issue of syntax.  It is nonsense to tell people to prefer
one syntax over another, but I will note that dual_state's use of the
undefined_object lends a unique feel to code that uses it.  The
programmer's intent is unmistakably clear when "undef" is used to
initialize members:

// -- begin
using dual_state_namespace::constants::undef;

class bar { ... };
class foo {
public:
    dual_state<int> x, y, z;
    dual_state<bar> a;

    foo() : a(undef), x(undef), y(undef), z(undef) {}
};
// -- end

The concept of using self-evident constructs to obviate documentation is
a valuable one.  It is detailed in the Boost.noncopyable notes.

*SUMMARY*
My intent has been to clearly and accurately articulate the conceptual
differences I see between Boost.optional, a reviewed and tested library
having several years usage history, and dual_state, a concept.  In that
I have succeeded or failed, others may comment, judge.

--
Andrew M. Jost