RFC: runtime-flexible arrays

Hi everyone, I have written a header for runtime-flexible multi-dimensional views and arrays and would love to get comments and feedback from the boost community: tutorial code : http://tinyurl.com/7hdldc9 main website : http://www.andres.sc/marray.html doxygen reference doc: http://www.andres.sc/marray/doc/index.html git repository : https://github.com/bjoern-andres/marray tech report : http://arxiv.org/pdf/1008.2909v1 Unlike in Boost.MultiArray, the dimension of arrays is not a template parameter in Marray. Other features include - runtime-flexible dimension, shape, size and indexing order (first or last coordinate major order) - arithmetic operators with automatic type promotion and expression templates. - access to entries via coordinates, scalar indices and STL-compliant random access iterators. - derived classes for matrices and vectors - support of exchangeable STL-compliant allocators. Thanks for your consideration Bjoern

On 02/15/12 19:42, Bjoern Andres wrote:
Hi everyone,
I have written a header for runtime-flexible multi-dimensional views and arrays and would love to get comments and feedback from the boost community:
tutorial code : http://tinyurl.com/7hdldc9 main website : http://www.andres.sc/marray.html doxygen reference doc: http://www.andres.sc/marray/doc/index.html git repository : https://github.com/bjoern-andres/marray tech report : http://arxiv.org/pdf/1008.2909v1
Unlike in Boost.MultiArray, the dimension of arrays is not a template parameter in Marray. Other features include
- runtime-flexible dimension, shape, size and indexing order (first or last coordinate major order) - arithmetic operators with automatic type promotion and expression templates. - access to entries via coordinates, scalar indices and STL-compliant random access iterators. - derived classes for matrices and vectors - support of exchangeable STL-compliant allocators.
Thanks for your consideration Bjoern
This sounds somewhat like: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/stepper/bo... with these several extra features:
- arithmetic operators with automatic type promotion and expression templates. - access to entries via coordinates, scalar indices and STL-compliant random access iterators. - derived classes for matrices and vectors - support of exchangeable STL-compliant allocators.
However, array_dyn has the ability to permute its axes (an array transpose is just a reverse of the axes). Demonstration of an application of such a permutation of axes (well, actually, a rotation of the axes) is here: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/stepper/li... and a more useful example is here: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/stepper/li... That last array_dyn code was made in response to post: http://article.gmane.org/gmane.comp.lib.boost.user/68017/match=multi+array+p... -regards, Larry

On 02/16/2012 02:42 AM, Bjoern Andres wrote:
Hi everyone,
I have written a header for runtime-flexible multi-dimensional views and arrays and would love to get comments and feedback from the boost community:
tutorial code : http://tinyurl.com/7hdldc9 main website : http://www.andres.sc/marray.html doxygen reference doc: http://www.andres.sc/marray/doc/index.html git repository : https://github.com/bjoern-andres/marray tech report : http://arxiv.org/pdf/1008.2909v1
Unlike in Boost.MultiArray, the dimension of arrays is not a template parameter in Marray.
In Boost.MultiArray, the dimension is not a template parameter; the number of dimensions is. I assume you always use the maximum number of dimensions, and treat arrays of size N*M the same as arrays or size N*M*1*1*1*1
Other features include
- runtime-flexible dimension, shape, size and indexing order (first or last coordinate major order)
- arithmetic operators with automatic type promotion and expression templates.
This should be decoupled from the container itself. Algorithms and containers are different things. If you're doing expression templates, you should re-use Boost.Proto to have a proper framework to deal with expression trees. And if you're supporting arithmetic operators, it will be expected that you generate high-performance parallel code (like Blitz++, Eigen, or NT2 -- I am the author of the latter). I'm not sure type promotion is such a good idea.
- access to entries via coordinates, scalar indices and STL-compliant random access iterators.
I find it confusing that iterators flatten the data. I'd rather they iterate on the external dimension. If you want to iterate over flattened data just make a view with a flattened shape and iterate that. Also if you support 1D and nD access, why not also support 2D or 3D access within a 4D array? Something else, what we do in NT2 is that we use an iliffe index as described in the NRC (Numerical Recipes for C) book. This allows us to make a view that extracts an arbitrary rectangle or hypercube and to add padding between lines without extra modulo operations. Do you use something like that as well?

On 02/16/12 01:17, Mathias Gaunard wrote:
On 02/16/2012 02:42 AM, Bjoern Andres wrote:
Hi everyone,
I have written a header for runtime-flexible multi-dimensional views and arrays and would love to get comments and feedback from the boost community:
tutorial code : http://tinyurl.com/7hdldc9 main website : http://www.andres.sc/marray.html doxygen reference doc: http://www.andres.sc/marray/doc/index.html git repository : https://github.com/bjoern-andres/marray tech report : http://arxiv.org/pdf/1008.2909v1
[snip] Something else, what we do in NT2 is that we use an iliffe index as described in the NRC (Numerical Recipes for C) book. This allows us to make a view that extracts an arbitrary rectangle or hypercube and to add padding between lines without extra modulo operations. How would NT2 do a transpose like the one on line 00201 of:
http://www.andres.sc/marray/doc/tutorial_8cxx_source.html ? Judging from a brief look at: http://en.wikipedia.org/wiki/Array_data_structure which suggests an iliffe data structure contains pointer to subarrays, I'd guess that would be more difficult than if a dope vector data structure, described on the same web page, were used. With a dope vector, a transpose would just mean reversing the dope vector. I'm guessing marray using something like a dope vector. However, with the iliffe data structure, I'd guess you'd have to recalculate several pointers. Right? -regards, Larry

On 02/16/2012 01:40 PM, Larry Evans wrote:
How would NT2 do a transpose like the one on line 00201 of:
What you can do is change the "storage order", i.e. consider that a transposed matrix in row-major is the same as the same matrix reinterpreted in column-major. I assume this library (marray) does just that. But mixing operations between different storage orders just causes massive amount of problems if you want to vectorize and use the cache efficiently, so we don't support it. I don't think we really have native support for transposed views other than with the actual type of the expression template trans(a).
? Judging from a brief look at:
http://en.wikipedia.org/wiki/Array_data_structure
which suggests an iliffe data structure contains pointer to subarrays, I'd guess that would be more difficult than if a dope vector data structure, described on the same web page, were used. With a dope vector, a transpose would just mean reversing the dope vector. I'm guessing marray using something like a dope vector. However, with the iliffe data structure, I'd guess you'd have to recalculate several pointers. Right?
We don't have a special function for in-place transposition, just transpose copy (which is fused with all other element-wise operations). In any case rebuilding the index of the result is necessary whenever the size or shape changes.

Le 16/02/2012 15:21, Mathias Gaunard a écrit :
But mixing operations between different storage orders just causes massive amount of problems if you want to vectorize and use the cache efficiently, so we don't support it.
We dont support it exactly for that. One possible solution was to use the majority storage order but it leads to poor performance. I didn't find any use case where such mix was required.
I don't think we really have native support for transposed views other than with the actual type of the expression template trans(a).
and transconj for complex but htis was a given.

On 02/16/12 10:20, Joel Falcou wrote:
Le 16/02/2012 15:21, Mathias Gaunard a écrit :
But mixing operations between different storage orders just causes massive amount of problems if you want to vectorize and use the cache efficiently, so we don't support it.
We dont support it exactly for that. One possible solution was to use the majority storage order but it leads to poor performance. I didn't find any use case where such mix was required.
Well, there's the case for rotated axes, as requested by: http://article.gmane.org/gmane.comp.lib.boost.user/68017/match=multi+array+p... However, I've not done any performance measures (mostly because I'm not sure how to do meaningful measurements). What benchmarks are you using to measure the performance? One problem I see with the array_dyn rotated axes is that the rotation is not done on the array itself but on the indices to the array. Thus, only the offsets are calculated. The offset then needs to be used as an index to the underlying storage (which is something like multi_array's data member function). Actually, the calculation is more like an iterator, but an offset iterator, not an iterator into the array. I fear that that probably takes too long relative to other methods. The output from the example: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/stepper/li... illustrates how the stack of indices change with the iteration. It has been used to do a matrix product as well as a tridiag solution: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/stepper/li... That tridiag method was then used to solve pde's here: http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/stepper/li... Although it is probably slower if the array us unrotated; if rotation is needed, as in array_dyn.diff_pde.cpp, then it may be faster since, using something like nt2 or multi_array would, AFAICT, require copying the array to a new storage order. Petros (see the match=multi=array+pdes above) has somehow solved the problem; however I don't know how. Petros? OTOH, maybe you have some ideas of how the rotated axes problem can be solved with nt2 and used to solve the pde examples in array_dyn.diff_pde.cpp. -regards, Larry

On 02/16/2012 06:40 PM, Larry Evans wrote:
Although it is probably slower if the array us unrotated; if rotation is needed, as in array_dyn.diff_pde.cpp, then it may be faster since, using something like nt2 or multi_array would, AFAICT, require copying the array to a new storage order. Petros (see the match=multi=array+pdes above) has somehow solved the problem; however I don't know how. Petros? OTOH, maybe you have some ideas of how the rotated axes problem can be solved with nt2 and used to solve the pde examples in array_dyn.diff_pde.cpp. -regards, Larry
storage order adaptation never trigger copy in NT2, it changes how the for loop nest is generated around the computation unless you do : a = as_< storage_order_stuff>(b) , where the copy is ofc explicit. NT2 core primitive is a loop nest generator on which we have some sort of control from the user land. Currently we dont support storage order of arbitrary complexity but storage order could be defined as a permutation of the dimensions.

On 02/16/12 12:17, Joel Falcou wrote:
On 02/16/2012 06:40 PM, Larry Evans wrote:
Although it is probably slower if the array us unrotated; if rotation is needed, as in array_dyn.diff_pde.cpp, then it may be faster since, using something like nt2 or multi_array would, AFAICT, require copying the array to a new storage order. [snip] storage order adaptation never trigger copy in NT2, it changes how the for loop nest is generated around the computation unless you do :
a = as_< storage_order_stuff>(b)
, where the copy is ofc explicit.
NT2 core primitive is a loop nest generator on which we have some sort of control from the user land. [snip]
Hi Joel, I've just downloaded nt2 from here: http://voxel.dl.sourceforge.net/project/nt2/nt2/nt2%202.0/nt2_2.0.tar.gz Are there some examples of this "loop nest generator" that I could look at to get a better idea of what you mean? TIA. -regards, Larry

On 02/16/2012 07:46 PM, Larry Evans wrote:
I've just downloaded nt2 from here:
http://voxel.dl.sourceforge.net/project/nt2/nt2/nt2%202.0/nt2_2.0.tar.gz
Are there some examples of this "loop nest generator" that I could look at to get a better idea of what you mean?
This is nt2 v2 which is deeply deprecated (how come these site still forward there) We have a github now with the up-to-date code. Now, you may want to wait for a few as I am currently fixing this very same storage order code ;)

Sorry for the noise :: the loop nest generator is basically a recursive function that gets dimensions informations from a fusion sequence of dimension set. The storage order is applied as a permutation on this sequence and passed to the generator. the shuffled position is passed to the whole expression tree and when we're down in a terminal, it get applied again to grant access to the proper memory size. The shuffling of position is done at compile time usign MPL on fusion::at_c parameter and incurs no overhead. That's the basic theory. Now above 2D, most array representation and memory access incurs a high payload on registers and may limit performances. So we treat nD array as a flattened 2D one and aggregate outer dimensions after shuffling.

On 02/16/12 13:05, Joel Falcou wrote:
On 02/16/2012 07:46 PM, Larry Evans wrote:
I've just downloaded nt2 from here:
http://voxel.dl.sourceforge.net/project/nt2/nt2/nt2%202.0/nt2_2.0.tar.gz
Are there some examples of this "loop nest generator" that I could look at to get a better idea of what you mean?
This is nt2 v2 which is deeply deprecated (how come these site still forward there) We have a github now with the up-to-date code. Now, you may want to wait for a few as I am currently fixing this very same storage order code ;)
Is this the right place to look for latest code? http://nt2.metascale.org/ or maybe this: https://github.com/MetaScale/nt2 ? TIA. -regards, Larry
participants (4)
-
Bjoern Andres
-
Joel Falcou
-
Larry Evans
-
Mathias Gaunard