Re: [boost] [fusion] segmented sequences [ was Re: [fusion] ftag ]

25 Sep 2006

      "Joel de Guzman" <joel@boost-consulting.com> wrote in message
news:4517F285.2010806@boost-consulting.com...
...
Andy Little wrote:
...
There do seem to be limits to when the compiler will optimise a view though,
It
may have something to do with copying references(aka joint-view) as raw
bytes,
after which AFAICS the compiler will treat them as pointers. that said that
is
only an impression so far.
Ok, now I understand. You sent me an email about stacked fold of
stacked joint_view created by push_back being difficult for the
compiler to optimize. Here's why (and there's a solution):
joint_view is a prime example of a segmented sequence. The resulting
iterators from it will have some overhead. That is inevitable.
See joint_view_iterator so you'll know what I mean.
However, Eric Niebler did some work on segmented algorithms. His
initial testing (for proto and early spirit-2 experiments) shows
some exciting results. Basically, a segmented sequence is a sequence
of sequences (for each segment). Instead of creating an iterator that
"weaves" through each segment (like joint_view_iterator), segmented
algorithms apply the algorithm recursively to each of the segments.
For example, a fold of a joint_view will be 2 calls to fold, one
for the left join and one for the right.
I know this is a bit hand-wavy, but once you see it in action, it
really makes a lot of sense. I'm CCing Eric. I think now is the right
time to incorporate his work into the fusion code base.
Yep. I don't understand,but it sounds great :-)

OK.  FWIW I found that  creating a sequence by  pushing_back result_types to an
empty vector is a great way to 'create' a result sequence at compile time.  It
kind of creates something out of nothing.  At runtime however I found I could
employ a different approach by using recursive functors(that is still compile
time rather than runtime recursion, but much faster to compile and using less
resources, whereas joint_view can be a resource hog), because now the type of
the result sequence is known. Basically the compile time and runtime stages are
doing different jobs. The compile time fold push_back is working out what the
final sequence will be. Now the runtime's job is to figure out the most
efficient way to initialise the sequence, that it already knows about. I havent
yet figured how to create an initialiser so that the compiler will initialise
the sequence right in the constructor(not that its impossible, just havent got
round to trying), but I guess that would be ideal. The compiler is seeming to
optimise away the default ctor, followed by the fill anyway, but it would be
satisfying if I could express that intent more directly. I guess this might
involve making a view, where each element contains an individual view of the
function for initialising that element.

Anyway it might be worth thinking about not copying too slavishly the compile
time behaviour at runtime, as maybe these are very different situations.

Second my analysis of copying is based on not too much science. IN VC8 the
assembler code gets pretty big and later functions tend to join the output of
earlier functions and with 20 Mb assembler files I tended to have ideas half way
through reading it and never get to the end of the story to see what the
compiler finally did! If that makes any sense. However it may well be that the
compiler does keep track of the stuff its copying. It just gets a bit
frightening seeing  rep movsd instructions on bigger and bigger joint_views
(where these might have typically 16 elemnts each with 2 referneces), with
bigger and bigger arguments, but OTOH I suspect once it has to copy then the
game is up.

The other question is whether Fusion scales. My guess is that the compile time
should flatten out once the compiler starts seeing the same types, but OTOH this
business of accumulating the result type may be 'interesting', but it requires
writing some applications to find out, so that is what I'm currently about.

regards
Andy Little