
"Joel de Guzman" <joel@boost-consulting.com> wrote in message news:eh85r6$ba$1@sea.gmane.org...
Ullrich Koethe wrote:
Joel de Guzman wrote:
VIGRA doesn't have an explicit RGBA type (TinyVector<T, 4> can be used instead), because so far no-one came up with a convincing proposal for these operations. But without them, RGBA is pretty useless.
Hmmm... TinyVector<T, 4>... I think VIGRA should use Fusion for that instead ;-)
I had a look at Fusion, but I'm not sure whether it would be helpful in this context. TinyVector is based on three design goals: it should support the std::vector interface (except for resize etc.),
Like boost::array?
it should be fast (you
have millions of these beasts in a single image),
Definitely.
and it should behave
like a built-in arithmetic type (except for division which is problematic because the zero vector is not the only one that may cause a division-by-zero error).
No problem. But have you seen Andy's work on matrices using fusion?
As far as the work on "tuple" matrices is concerned, though originally conceived to enable use in my Quan types in transform matrices: http://quan.sourceforge.net/quan_matters/doc/html/index.html The IMO more important use is to replace run time doubles with compile time "static" doubles usually for values of 1 or 0. The effect of this is to reduce a typical 4 x 4 matrix multiply from 64 multiplies and 48 adds down to for example of 9 multiplies and 9 adds in the case of a translation x rotation x translation transform That is quite a profound reduction. Similar reductions are of course possible when applying the transform to vertices. However there is a problem in VC7.1, which is that the compiler simply runs out of resources in relatively simple transfoms, using Fusion, and there is no way round that with Fusion AFAICS. OTOH There is no such problem in VC8 or gcc4.1.1 the other 2 compilers I tested. However rather than lose VC7.1, I opted to try a hand rolled version, IOW I stripped Fusion out completely and removed the iterators and provided custom vectors of 3,9, 4 and 16 elements and custom row and columns. This is not quite as neat as Fusion where one algorithm can be applied to theoretically any combination of matrices, however in looking at the assembler output from the hand made version I saw that by simplifying the programming and removing the extra layers of references that the compiler did now produce what looks to me perfect. (The example code here is simply of a 3x3 rotation matrix multiplied by itself.) N.B as an improvement on perfect, It should also be well noted that because this is a simple test with local constants, that the compiler has in fact Not instantiated this assembler code at all in the main function, but has actually simply outputs constants. (This can be seen in the main assembler at the end). This is an improvement on the Fusion version, where I guess the references do provide a barrier to some optimisations and functions were called in main. Be wary of short tests however ;-) Note also the custom at_c functors, which I found useful. These enable the actual type of result... reference, const reference, value, to be sorted on a element by element basis. In fact the quanta::as_ref etc are functors so arbitrary functors could be substituted for e.g multiply by a constant etc. IOW in light of this I am not sure now that using Fusion is optimal for what I want, but it did provide a good starting point and one could see this as optimising... Source, with some extraneous stuff is at the end. The assembler represents the mux(matrix,matrix) part before its optimised out in this example. Finally the main assembler, showing output of a constant. regards Andy Little 00001 dd 02 fld QWORD PTR [edx] 00003 dc 09 fmul QWORD PTR [ecx] 00005 dd 41 18 fld QWORD PTR [ecx+24] 00008 dc 4a 08 fmul QWORD PTR [edx+8] 0000b de c1 faddp ST(1), ST(0) 0000d dd 18 fstp QWORD PTR [eax] 0000f dd 42 08 fld QWORD PTR [edx+8] 00012 dc 49 20 fmul QWORD PTR [ecx+32] 00015 dd 02 fld QWORD PTR [edx] 00017 dc 49 08 fmul QWORD PTR [ecx+8] 0001a de c1 faddp ST(1), ST(0) 0001c dd 58 08 fstp QWORD PTR [eax+8] 0001f dd 42 20 fld QWORD PTR [edx+32] 00022 dc 49 18 fmul QWORD PTR [ecx+24] 00025 dd 42 18 fld QWORD PTR [edx+24] 00028 dc 09 fmul QWORD PTR [ecx] 0002a de c1 faddp ST(1), ST(0) 0002c dd 58 18 fstp QWORD PTR [eax+24] 0002f dd 41 08 fld QWORD PTR [ecx+8] 00032 dc 4a 18 fmul QWORD PTR [edx+24] 00035 dd 42 20 fld QWORD PTR [edx+32] 00038 dc 49 20 fmul QWORD PTR [ecx+32] 0003b de c1 faddp ST(1), ST(0) 0003d dd 58 20 fstp QWORD PTR [eax+32] int main() { matrix_type matrix( 1.,2.,zero(), 4.,5.,zero(), zero(),zero(),one() ); typedef quanta::matrix_row<2,matrix_type,quanta::as_const_ref> row0_type; row0_type row0(matrix); std::cout << quanta::of_vector::at_c<2,quanta::as_const_ref>()(row0) <<'\n'; typedef quanta::matrix_col<2,matrix_type,quanta::as_const_ref> col2_type; col2_type col2(matrix); std::cout << quanta::of_vector::at_c<2,quanta::as_const_ref>()(col2) <<'\n'; quanta::dot_product<0,0,matrix_type::cols> dot; std::cout << dot(matrix,matrix) <<'\n'; typedef quanta::matrix_mux<3,3,3,3> mux_type; mux_type mux; mux_type::result<matrix_type,matrix_type>::type result = mux(matrix,matrix); std::cout << result.at<0,0>() <<'\n'; } main function assembler for std::cout << result.at<0,0>() <<'\n'; ; Line 84 000c8 dd 05 00 00 00 00 fld QWORD PTR __real@4022000000000000 000ce 51 push ecx 000cf dd 1c 24 fstp QWORD PTR [esp] 000d2 e8 00 00 00 00 call ??6?$basic_ostream@DU?$char_traits@D@std@@@std@@QAEAAV01@N@Z ; std::basic_ostream<char,std::char_traits<char> >::operator<<