[preprocessor] Sequences vs. All Other Data Structures

21 Apr 2012

      Hello all.

In the past, I've said several times that sequences are a superior to
every other data structure when dealing with a collection of elements
(tuples are better in some very specific contexts).

One of those reasons is that, if a library is built for it, non-unary (or
even variadic) elements are natural because the structure of the sequence
itself encapsulates each element.

(int)(std::pair<int, int>)(double)

Another reason is that, if a library is built for it and if placemarkers
ala C99/C++11 are available, there is a natural representation of the nil
sequence:

/**/

...i.e. nothing.  This leads to very natural appends, prepends, and joins,
since these operation are simply variations on juxtaposition.

s (e)
(e) s
s1 s2

I'll use these in the examples that follow.

Another reason, which I'll demonstrate in this post, is that the very
structure of sequences usually contain the potential "processing power" to
process them.  By "processing power," I'm referring to algorithmic
steps (i.e. headroom)--which can be a finite resource in preprocessor 
metaprogramming.  The structure of the sequence itself allows what I've 
called "sequential iteration."

#define A(...) + B
#define B(...) - A

A(1)(2)(3) // + - + B

(The reason that this works is because B is not invoked "within" A and
vice versa.)

So, to convert a sequence to a comma-separated-list, you can just do
something like

#define CSV(seq) CSV_A seq
#define CSV_A(...) __VA_ARGS__ CSV_B
#define CSV_B(...) , __VA_ARGS__ CSV_C
#define CSV_C(...) , __VA_ARGS__ CSV_B

CSV( (1)(2)(3) ) // 1, 2, 3 CSV_B

Unfortunately, we still have to get rid of the spurious CSV_A, CSV_B, or
CSV_C (depending on the length of the sequence) that comes out the end.
That's easily done with concatenation.

#define CSV(seq) \
    CHAOS_PP_REVERSE_CAT(0, CSV_A seq) \
    /**/
#define CSV_A(...) __VA_ARGS__ CSV_B
#define CSV_B(...) , __VA_ARGS__ CSV_C
#define CSV_C(...) , __VA_ARGS__ CSV_B
#define CSV_A0
#define CSV_B0
#define CSV_C0

CSV( (1)(2)(3) ) // 1, 2, 3

The REVERSE_CAT is used here so that the commas don't have to be encoded.
Essentially, it's defined as REVERSE_CAT(a, ...) => __VA_ARGS__ ## a
except that it allows its arguments to expand.  A regular CAT could be
used, but you'd have to delay the existence of the commas.  Now *if* the
number of scans applied to a macro argument is guaranteed by definition of
CAT (as it is with CHAOS_PP_CAT), one can do something like this

#define CC CHAOS_PP_DEFER(CHAOS_PP_COMMA)()

#define CSV(seq) \
    CHAOS_PP_CAT(CSV_A seq, 0) \
    /**/
#define CSV_A(...) __VA_ARGS__ CSV_B
#define CSV_B(...) CC __VA_ARGS__ CSV_C
#define CSV_C(...) CC __VA_ARGS__ CSV_B
#define CSV_A0
#define CSV_B0
#define CSV_C0

CSV( (1)(2)(3) ) // 1, 2, 3

The DEFER macro used here delays the expansion of the COMMA() invocation
through exactly one scan which is all that CAT applies to its argument.
The CC is used here only as an abbreviation so that the example doesn't
get too cluttered.

This latter version is significantly more verbose, but this type of thing
is sometimes unavoidable especially when dealing with unmatched
parentheses rather than commas which is exactly what we'll run into next.

Let's say we wanted to calculate the length of the sequence.

#define LP CHAOS_PP_DEFER(CHAOS_PP_LPAREN)()
#define RP CHAOS_PP_DEFER(CHAOS_PP_RPAREN)()

#define SIZE(seq) \
    CHAOS_PP_CAT(SIZE_A seq, 0) 0 CHAOS_PP_CAT(SIZE_C seq, 0) \
    /**/
#define SIZE_A(...) CHAOS_PP_INC LP SIZE_B
#define SIZE_B(...) CHAOS_PP_INC LP SIZE_A
#define SIZE_A0
#define SIZE_B0
#define SIZE_C(...) RP SIZE_D
#define SIZE_D(...) RP SIZE_C
#define SIZE_C0
#define SIZE_D0

SIZE( (1)(2)(3) ) // CHAOS_PP_INC(CHAOS_PP_INC(CHAOS_PP_INC(0)))

(As with CC above, the LP and RP macros are just sugar to avoid clutter in
the example.)

Essentially, we've iterated the sequence twice.  The first time, we
generated CHAOS_PP_INC( for each element.  The second time, we generated )
for each element.

However, this doesn't "finish the job" because an expression like

MACRO CHAOS_PP_DEFER(CHAOS_PP_LPAREN)() ... )

causes the invocation of MACRO to be deferred through two scans, not one.
This is actually a good thing in this scenario, because we don't really
want CHAOS_PP_INC to be invoked "inside" CAT.  To cause the evaluation, we
just add another scan:

#define LP CHAOS_PP_DEFER(CHAOS_PP_LPAREN)() #define RP
CHAOS_PP_DEFER(CHAOS_PP_RPAREN)()

#define SIZE(seq) \
    SIZE_I(CHAOS_PP_CAT(SIZE_A seq, 0) 0 CHAOS_PP_CAT(SIZE_C seq, 0)) \
    /**/
#define SIZE_A(...) CHAOS_PP_INC LP SIZE_B
#define SIZE_B(...) CHAOS_PP_INC LP SIZE_A
#define SIZE_A0
#define SIZE_B0
#define SIZE_C(...) RP SIZE_D
#define SIZE_D(...) RP SIZE_C
#define SIZE_C0
#define SIZE_D0
#define SIZE_I(x) x

SIZE( (1)(2)(3) ) // 3

Like BOOST_PP_INC, CHAOS_PP_INC saturates at a finite value, and thus the
above implementation can only "correctly" measure the length of a
sequences of limited length.  One could use the unlimited, but more
expensive, CHAOS_PP_ARBITRARY_INC instead if necessary.

Generating a closing parentheses is common, so we'll generalize it

#define LP CHAOS_PP_DEFER(CHAOS_PP_LPAREN)()
#define RP CHAOS_PP_DEFER(CHAOS_PP_RPAREN)()

#define CLOSE(seq) CHAOS_PP_CAT(CLOSE_A seq, 0)
#define CLOSE_A(...) RP CLOSE_B
#define CLOSE_B(...) RP CLOSE_A
#define CLOSE_A0
#define CLOSE_B0

The above then becomes

#define SIZE(seq) \
    SIZE_I(CHAOS_PP_CAT(SIZE_A seq, 0) 0 CLOSE(seq)) \
    /**/
#define SIZE_A(...) CHAOS_PP_INC LP SIZE_B
#define SIZE_B(...) CHAOS_PP_INC LP SIZE_A
#define SIZE_A0
#define SIZE_B0
#define SIZE_I(x) x

Now, let's do something more serious.  I recently ran into a scenario
where I needed to generate something for each permutation of each element
of the powerset of a sequence.

For the powerset:

#define POWERSET(seq) \
    POWERSET_1(CHAOS_PP_CAT(POWERSET_T seq, 0) CLOSE(seq)) \
    /**/
#define POWERSET_1(x) x
#define POWERSET_2(e, ps) \
    CHAOS_PP_IIF(CHAOS_PP_SEQ_IS_CONS(ps))( \
        ps CHAOS_PP_SPLIT( \
            1, \
            POWERSET_5(CHAOS_PP_CAT(POWERSET_P ps, 0) e, CLOSE(ps)) \
        ), \
        (e)() \
    ) \
    /**/
#define POWERSET_3(...) POWERSET_4(__VA_ARGS__)
#define POWERSET_4(seq, e, ps) e, (e seq) ps
#define POWERSET_5(...) __VA_ARGS__
#define POWERSET_T(...) POWERSET_2 LP(__VA_ARGS__) CC POWERSET_F
#define POWERSET_F(...) POWERSET_2 LP(__VA_ARGS__) CC POWERSET_T
#define POWERSET_T0
#define POWERSET_F0
#define POWERSET_P(seq) POWERSET_3 LP seq CC POWERSET_Q
#define POWERSET_Q(seq) POWERSET_3 LP seq CC POWERSET_P
#define POWERSET_P0
#define POWERSET_Q0

POWERSET( (a)(b)(c) )
  // ( (c)       )
  // (           )
  // ( (b)(c)    )
  // ( (b)       )
  // ( (a)(c)    )
  // ( (a)       )
  // ( (a)(b)(c) )
  // ( (a)(b)    )

It important to note that this implementation is not limited by any
library-defined resource.  It is limited only by the amount of available
compiler resources.  With unlimited CPU and memory resources, this
algorithm compute the powerset of a sequence of *any* length.

For generating permutations, I'll use a helper algorithm which I've called
"FOCUS".  The job of this algorithm is to produce a sequence which
isolates each element in the input sequence.  For example,

FOCUS((a)(b)(c)(d), ~)
  // ( (a)(b)(c), (d), , ~)
  // ( (a)(b), (c), (d), ~)
  // ( (a), (b), (c)(d), ~)
  // ( , (a), (b)(c)(d), ~)

This is kind of a weird form.  Essentially, for each element x in the
input sequence, the algorithm produces an element in the resulting
sequence of the form:

( seq-of-elements-before-x, (x), seq-of-elements-after-x, ~)

#define FOCUS(seq, ...) \
    CHAOS_PP_INLINE_WHEN(CHAOS_PP_SEQ_IS_CONS(seq))( \
        FOCUS_1( \
            (CHAOS_PP_SEQ_HEAD(seq)), \
            CHAOS_PP_SEQ_TAIL(seq), \
            __VA_ARGS__
        ) \
    ) \
    /**/
#define FOCUS_1(h, t, ...) \
    FOCUS_4(CHAOS_PP_CAT(FOCUS_T t, 0)(, h, t, __VA_ARGS__) CLOSE(t)) \
    /**/
#define FOCUS_2(seq) FOCUS_3 seq
#define FOCUS_3(l, m, r, ...) \
    (l m, (CHAOS_PP_SEQ_HEAD(r)), CHAOS_PP_SEQ_TAIL(r), __VA_ARGS__) \
    (l, m, r, __VA_ARGS__) \
    /**/
#define FOCUS_4(x) x
#define FOCUS_T(...) FOCUS_2 LP FOCUS_F
#define FOCUS_F(...) FOCUS_2 LP FOCUS_T
#define FOCUS_T0
#define FOCUS_F0

As with POWERSET, this implementation is unlimited WRT library resources.

Finally, the permuations themselves:

#define PERMUTE(seq) \
    PERMUTE_3(CHAOS_PP_CAT(PERMUTE_P seq, 0) PERMUTE_1(seq,) CLOSE(seq)) \
    /**/
#define PERMUTE_1(seq, pf) \
    CHAOS_PP_IIF( \
        CHAOS_PP_BITAND(CHAOS_PP_SEQ_IS_CONS(seq)) \
            (CHAOS_PP_SEQ_IS_CONS(CHAOS_PP_SEQ_TAIL(seq))))( \
        PERMUTE_2, (pf seq) CHAOS_PP_EAT \
    )(FOCUS(seq, pf)) \
    /**/
#define PERMUTE_1_INDIRECT() PERMUTE_1
#define PERMUTE_2(fs) CHAOS_PP_CAT(PERMUTE_T fs, 0)
#define PERMUTE_3(x) x
#define PERMUTE_4(x) x
#define PERMUTE_T(l, m, r, pf) \
    CHAOS_PP_OBSTRUCT(PERMUTE_1_INDIRECT)()(l r, pf m) PERMUTE_F \
    /**/
#define PERMUTE_F(l, m, r, pf) \
    CHAOS_PP_OBSTRUCT(PERMUTE_1_INDIRECT)()(l r, pf m) PERMUTE_T \
    /**/
#define PERMUTE_T0
#define PERMUTE_F0
#define PERMUTE_P(...) PERMUTE_4 LP PERMUTE_Q
#define PERMUTE_Q(...) PERMUTE_4 LP PERMUTE_P
#define PERMUTE_P0
#define PERMUTE_Q0

PERMUTE((a)(b)(c))
  // ( (c)(b)(a) )
  // ( (c)(a)(b) )
  // ( (b)(c)(a) )
  // ( (b)(a)(c) )
  // ( (a)(c)(b) )
  // ( (a)(b)(c) )

Putting these together:

#define A(s, seq) \
    CHAOS_PP_EXPR_S(s)(CHAOS_PP_SEQ_FOR_EACH_S( \
        s, B, PERMUTE(seq) \
    )) \
    /**/
#define B(s, seq) CHAOS_PP_SEQ_CONCAT(seq)

CHAOS_PP_EXPR(CHAOS_PP_SEQ_FOR_EACH(
    A, POWERSET( (a)(b)(c) )
))

// c cb bc b ca ac a cba cab bca bac acb abc ba ab

This generates

\sum_{k=0}^n\frac{n!}{\left({n-k}\right)!} = e\int_1^\infty t^ne^{-t}\,dt 
= e\Gamma\left({n + 1, 1}\right)

results (apologies for the LaTeX).  For n = 3, as above, that's 16 results 
(where the 16th is empty from the nil sequence).  For n = 6, that's 1957 
results.  For n = 7, it's 13700 results--which was enough to ICE g++ 
running on my VM.

The moral of the story is that sequences are powerful because in many 
cases, an input sequence itself often contains enough "potential energy" 
to process it.  It isn't very difficult to produce n^2, 2^n, or even n^n 
computational steps.

I only really needed the case for n = 3, and I ultimately didn't use the 
above because the cost of a runtime solution acceptable and not really 
worth adding the above machinery (and I didn't want to take the time to 
add it to Chaos), but I thought I'd share it anyway.

Regards,
Paul Mensonides

implementation...
g++ -E -P -std=c++11 (or -std=c++0x) -I $CHAOS_ROOT <filename>
---

#include <chaos/preprocessor/cat.h>
#include <chaos/preprocessor/control/iif.h>
#include <chaos/preprocessor/control/inline_when.h>
#include <chaos/preprocessor/facilities/split.h>
#include <chaos/preprocessor/logical/bitand.h>
#include <chaos/preprocessor/punctuation/comma.h>
#include <chaos/preprocessor/punctuation/paren.h>
#include <chaos/preprocessor/recursion/basic.h>
#include <chaos/preprocessor/recursion/expr.h>
#include <chaos/preprocessor/seq/concat.h>
#include <chaos/preprocessor/seq/core.h>
#include <chaos/preprocessor/seq/for_each.h>
#include <chaos/preprocessor/tuple/eat.h>

#ifndef SEQ
    #define SEQ (a)(b)(c)
#endif

#define LP CHAOS_PP_DEFER(CHAOS_PP_LPAREN)()
#define RP CHAOS_PP_DEFER(CHAOS_PP_RPAREN)()
#define CC CHAOS_PP_DEFER(CHAOS_PP_COMMA)()

#define CLOSE(seq) CHAOS_PP_CAT(CLOSE_T seq, 0)
#define CLOSE_T(...) RP CLOSE_F
#define CLOSE_F(...) RP CLOSE_T
#define CLOSE_T0
#define CLOSE_F0

#define POWERSET(seq) \
    POWERSET_1(CHAOS_PP_CAT(POWERSET_T seq, 0) CLOSE(seq)) \
    /**/
#define POWERSET_1(x) x
#define POWERSET_2(e, ps) \
    CHAOS_PP_IIF(CHAOS_PP_SEQ_IS_CONS(ps))( \
        ps CHAOS_PP_SPLIT( \
            1, \
            POWERSET_5(CHAOS_PP_CAT(POWERSET_P ps, 0) e, CLOSE(ps)) \
        ), \
        (e)() \
    ) \
    /**/
#define POWERSET_3(...) POWERSET_4(__VA_ARGS__)
#define POWERSET_4(seq, e, ps) e, (e seq) ps
#define POWERSET_5(...) __VA_ARGS__
#define POWERSET_T(...) POWERSET_2 LP(__VA_ARGS__) CC POWERSET_F
#define POWERSET_F(...) POWERSET_2 LP(__VA_ARGS__) CC POWERSET_T
#define POWERSET_T0
#define POWERSET_F0
#define POWERSET_P(seq) POWERSET_3 LP seq CC POWERSET_Q
#define POWERSET_Q(seq) POWERSET_3 LP seq CC POWERSET_P
#define POWERSET_P0
#define POWERSET_Q0

// --

#define FOCUS(seq, ...) \
    CHAOS_PP_INLINE_WHEN(CHAOS_PP_SEQ_IS_CONS(seq))( \
        FOCUS_1( \
            (CHAOS_PP_SEQ_HEAD(seq)), CHAOS_PP_SEQ_TAIL(seq), \
            __VA_ARGS__ \
        ) \
    ) \
    /**/
#define FOCUS_1(h, t, ...) \
    FOCUS_4(CHAOS_PP_CAT(FOCUS_T t, 0)(, h, t, __VA_ARGS__) CLOSE(t)) \
    /**/
#define FOCUS_2(seq) FOCUS_3 seq
#define FOCUS_3(l, m, r, ...) \
    (l m, (CHAOS_PP_SEQ_HEAD(r)), CHAOS_PP_SEQ_TAIL(r), __VA_ARGS__) \
    (l, m, r, __VA_ARGS__) \
    /**/
#define FOCUS_4(x) x
#define FOCUS_T(...) FOCUS_2 LP FOCUS_F
#define FOCUS_F(...) FOCUS_2 LP FOCUS_T
#define FOCUS_T0
#define FOCUS_F0

// --

#define PERMUTE(seq) \
    PERMUTE_3(CHAOS_PP_CAT(PERMUTE_P seq, 0) PERMUTE_1(seq,) CLOSE(seq)) \
    /**/
#define PERMUTE_1(seq, pf) \
    CHAOS_PP_IIF( \
        CHAOS_PP_BITAND(CHAOS_PP_SEQ_IS_CONS(seq)) \
            (CHAOS_PP_SEQ_IS_CONS(CHAOS_PP_SEQ_TAIL(seq))))( \
        PERMUTE_2, (pf seq) CHAOS_PP_EAT \
    )(FOCUS(seq, pf)) \
    /**/
#define PERMUTE_1_INDIRECT() PERMUTE_1
#define PERMUTE_2(fs) CHAOS_PP_CAT(PERMUTE_T fs, 0)
#define PERMUTE_3(x) x
#define PERMUTE_4(x) x
#define PERMUTE_T(l, m, r, pf) \
    CHAOS_PP_OBSTRUCT(PERMUTE_1_INDIRECT)()(l r, pf m) PERMUTE_F \
    /**/
#define PERMUTE_F(l, m, r, pf) \
    CHAOS_PP_OBSTRUCT(PERMUTE_1_INDIRECT)()(l r, pf m) PERMUTE_T \
    /**/
#define PERMUTE_T0
#define PERMUTE_F0
#define PERMUTE_P(...) PERMUTE_4 LP PERMUTE_Q
#define PERMUTE_Q(...) PERMUTE_4 LP PERMUTE_P
#define PERMUTE_P0
#define PERMUTE_Q0

// --

#define A(s, seq) \
    CHAOS_PP_EXPR_S(s)(CHAOS_PP_SEQ_FOR_EACH_S( \
        s, B, PERMUTE(seq) \
    )) \
    /**/
#define B(s, seq) CHAOS_PP_SEQ_CONCAT(seq)

CHAOS_PP_EXPR(CHAOS_PP_SEQ_FOR_EACH(
    A, POWERSET(SEQ)
))

Paul Mensonides

Nathan Ridge

Dave Abrahams

Paul Mensonides

Nathan Ridge

Paul Mensonides

Nathan Ridge

Paul Mensonides

Edward Diener

Sebastian Redl

Edward Diener

Stephan T. Lavavej

Edward Diener

Paul Mensonides

Stephan T. Lavavej

Paul A. Bristow

Dave Abrahams

lcaminiti

Stephan T. Lavavej

Paul A. Bristow

Paul Mensonides

Paul Mensonides

Paul Mensonides

Paul Mensonides

Stephan T. Lavavej

Paul Mensonides

Edward Diener

Stephan T. Lavavej

Nathan Ridge

Dave Abrahams

Dave Abrahams

paul Fultz

Paul Mensonides

paul Fultz

Paul Mensonides

paul Fultz

paul Fultz

Paul Mensonides

Paul Mensonides

Dave Abrahams

paul Fultz

lcaminiti

Paul Mensonides

tags

participants (9)