Re: [boost] GIL Image Processing Library - Fusion and TinyVector

19 Oct 2006

      "Joel de Guzman" <joel@boost-consulting.com> wrote in message
news:eh85r6$ba$1@sea.gmane.org...
...
Ullrich Koethe wrote:
...
Joel de Guzman wrote:
...
...
VIGRA doesn't have an explicit RGBA type (TinyVector<T, 4> can be used
instead), because so far no-one came up with a convincing proposal for
these operations. But without them, RGBA is pretty useless.
Hmmm... TinyVector<T, 4>... I think VIGRA should use Fusion for
that instead ;-)
I had a look at Fusion, but I'm not sure whether it would be helpful in
this context. TinyVector is based on three design goals: it should support
the std::vector interface (except for resize etc.),
Like boost::array?
it should be fast (you
...
have millions of these beasts in a single image),
Definitely.
and it should behave
...
like a built-in arithmetic type (except for division which is problematic
because the zero vector is not the only one that may cause a
division-by-zero error).
No problem. But have you seen Andy's work on matrices using fusion?
As far as the work on "tuple" matrices is concerned, though originally conceived
to enable use in my Quan types in transform matrices:

http://quan.sourceforge.net/quan_matters/doc/html/index.html

The  IMO more important use is to replace run time doubles with compile time
"static" doubles usually for values of 1 or 0.

The effect of this is to reduce a typical 4 x 4 matrix multiply from  64
multiplies and 48 adds down to  for example of 9 multiplies and 9 adds in the
case of a translation x rotation x translation transform That is quite a
profound reduction. Similar reductions are of course possible when applying the
transform to vertices.

However there is a problem in VC7.1, which is that the compiler simply runs out
of resources in relatively simple transfoms, using Fusion, and there is no way
round that with Fusion AFAICS. OTOH  There is no such problem in VC8 or gcc4.1.1
the other 2 compilers I tested. However rather than lose VC7.1,  I opted to try
a hand rolled version, IOW I stripped Fusion out completely and removed the
iterators and provided custom vectors of 3,9, 4 and 16 elements and custom row
and columns. This is not quite as neat as Fusion where one algorithm can be
applied to theoretically any combination of matrices, however in looking at the
assembler output from the hand made version I saw that by simplifying the
programming and removing the extra layers of references that the compiler did
now produce what looks to me perfect. (The example code here is simply of a 3x3
rotation matrix multiplied by itself.)

 N.B as an improvement on perfect, It should also be well noted that because
this is a simple test  with local constants, that the compiler has in fact Not
instantiated this assembler code at all in the main function, but has actually
simply outputs constants. (This can be seen in the main assembler at the end).
This is an improvement on the Fusion version, where I guess the references do
provide a barrier to some optimisations and functions were called in main. Be
wary of short tests however ;-)

Note also the custom at_c functors, which I found useful. These enable the
actual type of result... reference, const reference, value, to be sorted on a
element by
element basis. In fact the quanta::as_ref etc are functors so arbitrary functors
could be substituted for e.g multiply by a constant etc.

IOW in light of this I am not sure now that using Fusion is optimal for what I
want, but it did provide a good starting point and one could see this as
optimising...

Source, with some extraneous stuff is at the end. The assembler represents the
mux(matrix,matrix) part before its optimised out in this example. Finally the
main assembler, showing output of a constant.

regards
Andy Little

  00001 dd 02   fld  QWORD PTR [edx]
  00003 dc 09   fmul  QWORD PTR [ecx]
  00005 dd 41 18  fld  QWORD PTR [ecx+24]
  00008 dc 4a 08  fmul  QWORD PTR [edx+8]
  0000b de c1   faddp  ST(1), ST(0)
  0000d dd 18   fstp  QWORD PTR [eax]
  0000f dd 42 08  fld  QWORD PTR [edx+8]
  00012 dc 49 20  fmul  QWORD PTR [ecx+32]
  00015 dd 02   fld  QWORD PTR [edx]
  00017 dc 49 08  fmul  QWORD PTR [ecx+8]
  0001a de c1   faddp  ST(1), ST(0)
  0001c dd 58 08  fstp  QWORD PTR [eax+8]
  0001f dd 42 20  fld  QWORD PTR [edx+32]
  00022 dc 49 18  fmul  QWORD PTR [ecx+24]
  00025 dd 42 18  fld  QWORD PTR [edx+24]
  00028 dc 09   fmul  QWORD PTR [ecx]
  0002a de c1   faddp  ST(1), ST(0)
  0002c dd 58 18  fstp  QWORD PTR [eax+24]
  0002f dd 41 08  fld  QWORD PTR [ecx+8]
  00032 dc 4a 18  fmul  QWORD PTR [edx+24]
  00035 dd 42 20  fld  QWORD PTR [edx+32]
  00038 dc 49 20  fmul  QWORD PTR [ecx+32]
  0003b de c1   faddp  ST(1), ST(0)
  0003d dd 58 20  fstp  QWORD PTR [eax+32]

int main()
{
    matrix_type matrix(
        1.,2.,zero(),
        4.,5.,zero(),
        zero(),zero(),one()
    );
    typedef  quanta::matrix_row<2,matrix_type,quanta::as_const_ref> row0_type;
    row0_type row0(matrix);

    std::cout << quanta::of_vector::at_c<2,quanta::as_const_ref>()(row0) <<'\n';

    typedef  quanta::matrix_col<2,matrix_type,quanta::as_const_ref> col2_type;
    col2_type col2(matrix);

    std::cout << quanta::of_vector::at_c<2,quanta::as_const_ref>()(col2) <<'\n';
    quanta::dot_product<0,0,matrix_type::cols> dot;

    std::cout << dot(matrix,matrix) <<'\n';

    typedef quanta::matrix_mux<3,3,3,3> mux_type;
    mux_type mux;
    mux_type::result<matrix_type,matrix_type>::type result = mux(matrix,matrix);

    std::cout << result.at<0,0>() <<'\n';

}

main function assembler for     std::cout << result.at<0,0>() <<'\n';

; Line 84
  000c8 dd 05 00 00 00
 00   fld  QWORD PTR __real@4022000000000000
  000ce 51   push  ecx
  000cf dd 1c 24  fstp  QWORD PTR [esp]
  000d2 e8 00 00 00 00  call
??6?$basic_ostream@DU?$char_traits@D@std@@@std@@QAEAAV01@N@Z ;
std::basic_ostream<char,std::char_traits<char> >::operator<<