[boost] [review][proto]suggest doc debug & category guided design

28 Mar 2008

      It's all in the attachment.  I had several other topics
I wanted to explore; however, this will have to do for now.

NOTE: The following is based on a download on 2007-02-17 of proto.zip
from boost vault.

* What is your evaluation of the design?

  Fair.  One problem is the "mixing" of grammars with transforms. As
  stated here: 

    <boost-root>/libs/proto/doc/html/boost_proto/users_guide/expression_transformation/canned_transforms/pass_through.html

    all expression generator meta-functions (For example, posit<>,
    shift_right<>, function<>, nary_expr<>, etc.) have a pass-through
    transform by default.

  which means that when, for example, matches<posit<_>,Expr> is
  compiled, the associated pass-through transform is not used.  If
  it's not used in matches, then it should be eliminated.  Of course
  it is used when the transform is applied; however, a transform
  template class taking a grammar as argument (the proto::when
  template seems appropriate) should be used to do that.  This would
  make it clearer what the programmer's intention is.  That means
  another template instantiation, thus slowing compile times; however,
  I think the time saved in understanding the code would compensate
  for that.  Of course that's a judgment call.  

  This "unused transform" feature is especially obvious with and_.
  Only the last transform (call it Xn) associated with the list of
  and_ args can ever be used. If transforms were eliminated from the
  and_ args and and_ was only used for matches, then:

    when<and_<G1,...,Gn>,Xn>

  could be used with the same effect as and_ with transforms
  associated with the args.  Here, when is more obviously a transform,
  Tn,enabled by condition and_<G1,...,Gn>.  Hm..., but that "looks"
  (based on what if means in c++)
  more like proto::if_, however I've never used proto::if_ and haven't
  read proto::if_ docs; hence, maybe that conclusion is all wrong.  I
  have read somewhere that proto::if_'s 1st argument is a transform,
  which seems odd when the if_ name would suggest it's a grammar with
  a noop (unused) transform.

  What about proto::or_<T0,T1,...,Tn>?  If the T0,T1,...,Tn were
  changed to "pure" grammars, G0,G1,...,Gn, then how would the
  *very* useful examples of transforms such as CalcArity in:

    <boost-root>/libs/proto/doc/html/boost_proto/users_guide/expression_transformation/example__calculator_arity_transform.html

  be done? Well, as explained in the in-source comments to
  proto::switch_ in matches.hpp, proto::switch_<Cases> could be used
  instead.  OOPS, looking at switch_ shows it contains:

                template<typename This, typename Expr, typename State, typename Visitor>
                struct result<This(Expr, State, Visitor)>
                {
                    typedef typename Cases::template case_<typename Expr::proto_tag> which;
                    typedef typename which::template result<void(Expr, State, Visitor)>::type type;
                };

  and, along with the example in code in /libs/proto/test/matches.cpp,
  shows that the only test is on the tag of the grammar and not the
  children.  So.. maybe switch_ should be renamed to switch_tag and a
  new template, switch_gram, which does what the current or_ does
  except the template argument's are instances of a new template,
  proto::case_, where proto::case_ is like the current proto::when
  except:

    1)  The first arguments don't have to have a transform
        associated with them. 

    2)  There's a default 2nd argument to proto::case_
        which is identity_t (identity transform, which is the same as
        the transform part of the current proto::terminal):

      struct identity_t
      {
          template<typename Sig>
          struct result;

          template<typename This, typename Expr, typename State, typename Visitor>
          struct result<This(Expr, State, Visitor)>
          {
              typedef Expr type;
          };

          template<typename Expr, typename State, typename Visitor>
          Expr const &operator ()(Expr const &expr, State const &, Visitor &) const
          {
              return expr;
          }
      }

  So, with these additional templates, the CalcArity might be:    

    struct CalcArity
      : switch_gram<
            case_< terminal_g< placeholder1 >,
                    mpl::int_<1>()
            >
          , case_< terminal_g< placeholder2 >,
                    mpl::int_<2>()
            >
          , case_< terminal_g<_>,
                    mpl::int_<0>()
            >
          , case_< unary_expr_g<_, CalcArity>,
                    CalcArity(_arg)
            >
          , case_< binary_expr_g<_, CalcArity, CalcArity>,
                    mpl::max<CalcArity(_left),
                             CalcArity(_right)>()
            >
        >
    {};

    where:
       XXX_g, for XXX={terminal,unary_expr,binary_expr},
       is the "grammar part" of the current XXX template.

  However, even here there's a sortof mixture of grammar and
  transform.  The "source" grammar of the transform is the one that
  appeared earlier on the example__calculator_arity_transform.html
  page (here, renamed CalcGram and with XXX->XXX_g [see where:
  paragraph under CalcArity above]):

    struct CalcGram
      : or_<
            terminal_g< placeholder1 >
          , terminal_g< placeholder2 >
          , terminal_g< _ >
          , unary_expr_g< _, CalcGram >
          , binary_expr_g< _, CalcGram, CalcGram >
        >
    {};

  In order to completely separate the grammar from the transforms,
  something like an algebraic homomorphism:

    http://planetmath.org/encyclopedia/HomomorphismBetweenAlgebraicSystems.html

  is needed.  Each function name, w_A, in source algebra in that
  reference would correspond to some grammar, G_w, in the or_:

    or_<G1,G2,...,Gn>

  The bad thing about that is that the function names would be sorta a
  duplication of the grammars, I think (I haven't thought it thru
  really).  The only way I can be more certain of this is to try and
  implement it; however, I'm already way late in doing this review,
  so, that suggestion about complete separation of grammar from
  transform is just *highly* speculative.

* What is your evaluation of the implementation?

  Not done.

* What is your evaluation of the documentation?

  Good.  However:

  1) Some transform template's are barely documented or not documented
  (except in-source) at all (e.g. switch_).  There really needs a full
  toc like the mpl one here:

    http://www.boost.org/libs/mpl/doc/refmanual/refmanual_toc.html 

  I almost always go there when I want to figure out what mpl
  templates do.

  2) Some transform documents are verify formally documented.  IOW, they
  have almost a set of rules for guiding the reader, step-by-step,
  into how the transforms work.  Each step in the process is described
  by a table in documentation of the outermost template.  These tables
  have the before expression in the 1st column and the step's
  resulting expression in the 2nd column. This is very good.  I used
  it to help me understand parts of proto. 

  However, manual execution of rules on a given expression is hard
  because the user has to keep track in his mind the intermediates
  steps, and that's a lot of mental bookkeeping.  It would be very
  helpful if there was a "proto-debugger" which, given an source
  expression, executes each of these steps (noting which table is
  being executed) and shows the correspondence between names in the
  "rules tables" with names in the source expression or intermediate
  expression, and finally stops when the expression cannot be reduced
  further.  Maybe even proto could be used to do this.  A sorta
  "self-aware" proto ;) This ability to take an expression and simply
  it using a set of rules would have uses in other domains, as shown
  by Task 3 in Markus' post:

  http://lists.boost.org/Archives/boost/2008/03/134376.php

  There's a prototype of doing something similar (although no
  association is shown between rules and tables) for language grammar
  analysis in the vault:

  http://www.boost-consulting.com/vault/index.php?&directory=Strings%20-%20Text%20Processing

  It just uses proto::eval to go through the steps until not further
  change is possible.

  2) There needs to be a more detailed description of how the 1st
    argument, the expression argument, of proto::matches "relates" to the
    2nd argument, the grammar argument.  The description here:

       <boost-root>/libs/proto/doc/html/boost/proto/result_of/matches.html

    only uses "expression grammars" but makes no mention of proto
    meta-functions for generating expressions:

      <boost-root>/libs/proto/doc/html/boost_proto/users_guide
      /expression_construction/tags_and_meta_functions.html

    Judging from the example of using a terminal grammar, terminal<X>,
    to generate an expression by appending ::type:

      terminal<X>::type

    I jumped to the rash conclusion that a similar technique would
    work for binary_expr<Tag,L,R>.  IOW, an expression could be
    generated by: 

      binary_expr<Tag,L,R>::type

    However, this is only true if L and R are already expression.
    This rash conclusion lead to a time consuming
    "debugging of my thinking" as evidenced by the thread:

      http://lists.boost.org/Archives/boost/2008/03/133715.php

    Instead, the ::type appending must be propagated to all sub
    grammar expressions, as done by the morphism template attached to:

      http://lists.boost.org/Archives/boost/2008/03/134964.php

    Avoiding the rash conclusion mentioned above would probably be
    easier if proto grammars were prominently classified on some page
    into the following grammar subtypes:

       1) expression grammars:

         These are the grammars appearing in the matches.html page
         mentioned above.  These also are expression types; hence, the
         can also be used as the first argument to matches.

         In contrast to the meta-function grammars [2) below], for any
         expression grammar, E, the following expression fails to
         compile: 

            E::type

         (Digression to design review:  This "schizophrenic" feature of
         expression grammars makes understanding match harder in a similar
         way that untyped variables in an interpreted language makes
         understanding that code's functions harder.  If there
         were an as_gram<ExprType> meta-function which returned the
         corresponding meta-function grammar generator, and only
         meta-function and predicate grammars [see below] were allowed as
         the 2nd argument to matches, then this would ease
         understanding and probably debugging [since, hopefully,
         deciphering compiler error messages would be easier].
         Alternately, there could be an is_gram<ExprType> that when
         used in some kinda debug mode, diagnosed, with clearer error
         messages than the compilers, that ExprType was not a valid
         grammar. This is_gram could implemented using the boost
         concept checking library, AFAICT. 
         )

       2) meta-function grammar:

          These (except for the nullary ones) appear in the last
          column of the table on:

            <boost-root>/libs/proto/doc/html/boost_proto/users_guide
              /expression_construction/tags_and_meta_functions.html

          As the name, meta-function, implies, they all have a nested
          ::type which returns the corresponding expression type
          [a.k.a. expression grammar, see 1) immediately above].

          These are classified as nullary and non-nullary depending on
          the number of "sub-grammars":

          nullary)

            These are the ones formed by the proto::op::terminal
            template. They have no sub-grammars.

          non-nullary)

            These have an associated number (the arity) defining the
            number of sub-grammars.  Each non-nullary template also is
            associated with its own c++ operator.  The arity of the
            grammars is the same as the arity of the associated c++
            operator. 

          definition: meta-function grammar instance:

            basis-step)
              A nullary meta-function grammar is an meta-function grammar
              instance.

            induction-step)

              If O is a non-nullary meta-function grammar template with
              arity, N, and

                O1_inst,O2_inst, ... ,ON_inst

              are all meta-function grammar instances, then

                O<O1_inst,O2_inst, ... ,ON_inst>

              is an meta-function grammar instance.

       3) predicate grammar:

          These are grammars with no corresponding c++ operators.
          In contrast to meta-function grammars, for any predicate
          grammar, B, the following expressions fail to compile:

            B::type
            tag_of<B>::type

          One reason for this is because there maybe be more than one
          type of proto expression that matches a given predicate
          grammar.  For example, either an expression
          matching T0 or one matching T1 matches or_<T0,T1>.

          There one special case of a predicate grammar, the wild-card
          predicate: 

            proto::wildcardns_::_

          which matches any proto expression.  This is analogous to
          a Boolean predicate always returning true no matter what the
          argument. 

          Other important predicate grammars (in proto::control
          namespace) are not_, and_ and or_.  AFAICT, just like any
          predicate logic expression can be constructed using:

            true
            not
            and
            or

          the equivalent of all other predicate grammars, such as:

            if_

          can be constructed using just these four:

            _
            not_
            and_
            or_

* What is your evaluation of the potential usefulness of the library?

  Very.

* Did you try to use the library?  With what compiler?  Did you
  have any problems?

  Yes:

    http://www.boost-consulting.com/vault/index.php?&directory=Strings%20-%20Text%20Processing

      &filename=cfg_lookahead_extends.zip
      &filename=toy_attract.pair.zip

* How much effort did you put into your evaluation? A glance? A
  quick reading? In-depth study?

  In-depth, although till have a long way to go.

* Are you knowledgeable about the problem domain?

  Somewhat; however, I'm obviously unfamiliar with the specific
  problems proto was designed to solve.  For example, I was unaware
  of some of proto viewpoints expressed by Eric in the last 2
  paragraphs of this post:

  http://lists.boost.org/Archives/boost/2008/03/134959.php

* Do you think the library should be accepted as a Boost library?
  Be sure to say this explicitly so that your other comments don't
  obscure your overall opinion.

  Eventually, but not yet.  There needs to be more documentation
  debugging and also more exploration of alternative designs, in
  particular, designs guided by algebraic morphisms and category
  theory.  One justification for this exploration is the morphism
  template in attachment to:

  http://lists.boost.org/Archives/boost/2008/03/134964.php

  Although that post observed that that morphism template:

    is correct in the limited case of those
    dual-purpose grammars-and-metafunctions.

  that doesn't mean it couldn't be extended to handle more general
  cases. In particular, in the case of expression grammars, I'd think
  the morphism template in the attachment could just be specialized to
  be an identity morphism:

  template
  < class Tag
  , class Args
  , int Arity
...
struct morphism
  < expr<Tag,Args,Arity>
...
{
        typedef
      expr<Tag,Args,Arity>
    type;
};    

  Further specialization designed to error when given a predicate
  grammar would, AFAICT, correctly calculate from either a
  meta-function grammar or an expression grammar the matching
  expression type.

  Category theory *might* be used to guide the predicate grammar
  design, as suggested (very indirectly) by parts of:

  http://www.amazon.com/Algebraic-Approaches-Semantics-Monographs-Computer/dp/...

  For Section 3.3, "The Boolean Algebra of Guards", contains:

     Pascal allows a statement of the form:

       if x>0 and not x=5 then S_1 else S_2

     If S_1,S_2 are to be semantically interpreted as morphisms X->Y
     in a category _C_, how can "P and not Q" be interpreted, where P
     is "x>0" and Q is "x=5"?  ... Equivalently, they [i.s. P and Q]
     may be represented as guard functions as in 1.5.20. ... for an
     object X in a partially additive category _C_, there is a subset,
     Guard(X) of _C_(X,X) of "guard morphisms" which form a Boolean
     algebra. 

  On the same page where 1.5.20 mentioned in the above quote appears,
  there's:

     22 Definition.  let (p1,...,pn) be an n-way test on X and let
     f1,...,fn be in SC(X,Y).  Then a natural generalization of the
     case statement in Pascal is:

       case(p1,...,pn)of(f1,...,fn)=f1 p1+...+fn pn

     where the p's correspond to the lhs of the when's in an or_ with
     elements decorated with transforms and
     the f's correspond to the transforms on rhs of the corresponding
     when's. 

  In conclusion, the somewhat vague similarities, noted above, of
  proto with algebraic morphisms and, more generally, with category
  theory strongly suggests (to me at least) there could be a large
  benefit to trying to make these similarities more concrete and using
  that knowledge to improve proto's design.

[boost] [review][proto]suggest doc debug & category guided design

Larry Evans