
"Joel de Guzman" <joel@boost-consulting.com> wrote in message news:4517F285.2010806@boost-consulting.com...
Andy Little wrote:
There do seem to be limits to when the compiler will optimise a view though, It may have something to do with copying references(aka joint-view) as raw bytes, after which AFAICS the compiler will treat them as pointers. that said that is only an impression so far.
Ok, now I understand. You sent me an email about stacked fold of stacked joint_view created by push_back being difficult for the compiler to optimize. Here's why (and there's a solution):
joint_view is a prime example of a segmented sequence. The resulting iterators from it will have some overhead. That is inevitable. See joint_view_iterator so you'll know what I mean.
However, Eric Niebler did some work on segmented algorithms. His initial testing (for proto and early spirit-2 experiments) shows some exciting results. Basically, a segmented sequence is a sequence of sequences (for each segment). Instead of creating an iterator that "weaves" through each segment (like joint_view_iterator), segmented algorithms apply the algorithm recursively to each of the segments. For example, a fold of a joint_view will be 2 calls to fold, one for the left join and one for the right.
I know this is a bit hand-wavy, but once you see it in action, it really makes a lot of sense. I'm CCing Eric. I think now is the right time to incorporate his work into the fusion code base.
Yep. I don't understand,but it sounds great :-) OK. FWIW I found that creating a sequence by pushing_back result_types to an empty vector is a great way to 'create' a result sequence at compile time. It kind of creates something out of nothing. At runtime however I found I could employ a different approach by using recursive functors(that is still compile time rather than runtime recursion, but much faster to compile and using less resources, whereas joint_view can be a resource hog), because now the type of the result sequence is known. Basically the compile time and runtime stages are doing different jobs. The compile time fold push_back is working out what the final sequence will be. Now the runtime's job is to figure out the most efficient way to initialise the sequence, that it already knows about. I havent yet figured how to create an initialiser so that the compiler will initialise the sequence right in the constructor(not that its impossible, just havent got round to trying), but I guess that would be ideal. The compiler is seeming to optimise away the default ctor, followed by the fill anyway, but it would be satisfying if I could express that intent more directly. I guess this might involve making a view, where each element contains an individual view of the function for initialising that element. Anyway it might be worth thinking about not copying too slavishly the compile time behaviour at runtime, as maybe these are very different situations. Second my analysis of copying is based on not too much science. IN VC8 the assembler code gets pretty big and later functions tend to join the output of earlier functions and with 20 Mb assembler files I tended to have ideas half way through reading it and never get to the end of the story to see what the compiler finally did! If that makes any sense. However it may well be that the compiler does keep track of the stuff its copying. It just gets a bit frightening seeing rep movsd instructions on bigger and bigger joint_views (where these might have typically 16 elemnts each with 2 referneces), with bigger and bigger arguments, but OTOH I suspect once it has to copy then the game is up. The other question is whether Fusion scales. My guess is that the compile time should flatten out once the compiler starts seeing the same types, but OTOH this business of accumulating the result type may be 'interesting', but it requires writing some applications to find out, so that is what I'm currently about. regards Andy Little