[multi_array] Is there interest for an [index_list] library?

(This email has earlier been sent to the wrong list. Thanks to Steven Watanabe for pointing that out.) Dear all, I would like to gauge the interest of the community in a library that can serve 1) as a complement to Boost.MultiArray and 2) as an alternative to Boost.MultiArray. For the first type of use, an example of the candidate library's capabilities is that it makes available a special iterator "it" that allows to browse all the elements of a MultiArray "m". This special iterator has an it.indices() members that returns a "list" Collection of indices (i.e. an IndexList) such that "m(list)" and "*it" are the same elements. This can be useful for functions whose effect on an element of "m" depends on where this element is situated in "m". The second way to use the candidate library is as an alternative to Boost.MultiArray of dynamically chosen (execution time) dimensionality. Moreover, the elements do not necessarily need to be arranged in a multi-dimensional "box", as is the case in a MultiArray. For example, a "simplex" organization of the elements is already implemented. If you are interested to know more, I strongly suggest you to take a look at the example file in sandbox/index_list/libs/index_list/examples.cpp . The library itself is in sandbox/index_list/boost and a readme file (with some details concerning concepts and "priorities") is in sandbox/index_list/libs/README . An automatically generated (Doxygen) documentation is located in sandbox/index_list/libs/index_list/doc/index.html . Sadly, there is no "real" documentation for now. Many thanks for your time. Sincerely, Pierre-André Noël

On Sat, Apr 9, 2011 at 2:20 PM, Pierre-Andre Noel <noel.pierre.andre@gmail.com> wrote:
Dear all,
I would like to gauge the interest of the community in a library that can serve 1) as a complement to Boost.MultiArray and 2) as an alternative to Boost.MultiArray.
As far as I know, the biggest problem with MultiArray is that it is not a model of the Assignable concept: the LHS doesn't resize. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

As far as I know, the biggest problem with MultiArray is that it is not a model of the Assignable concept: the LHS doesn't resize.
I guess that the (not-yet-existing) indexable_space class (name non-definitive) could be made Assignable without much problem. However, I am not a computer scientist and I would not have thought this to be an important point. If there turns out to be a demand for the proposed library, this kind of considerations would definitely be one point where I need the input of the community. On Sun, Apr 10, 2011 at 9:31 AM, Dave Abrahams <dave@boostpro.com> wrote:
On Sat, Apr 9, 2011 at 2:20 PM, Pierre-Andre Noel <noel.pierre.andre@gmail.com> wrote:
Dear all,
I would like to gauge the interest of the community in a library that can serve 1) as a complement to Boost.MultiArray and 2) as an alternative to Boost.MultiArray.
As far as I know, the biggest problem with MultiArray is that it is not a model of the Assignable concept: the LHS doesn't resize.
-- Dave Abrahams BoostPro Computing http://www.boostpro.com

I would like to gauge the interest of the community in a library that can serve 1) as a complement to Boost.MultiArray and 2) as an alternative to Boost.MultiArray.
Caveat, I've taken only the briefest of glances at what's in the sandbox...
For the first type of use, an example of the candidate library's capabilities is that it makes available a special iterator "it" that allows to browse all the elements of a MultiArray "m". This special iterator has an it.indices() members that returns a "list" Collection of indices (i.e. an IndexList) such that "m(list)" and "*it" are the same elements. This can be useful for functions whose effect on an element of "m" depends on where this element is situated in "m".
Does the functionality work atop the MultiArray concept or atop boost::multi_array and boost::multi_array_ref? One can linearly walk the contents of the latter using iterators foo.data() and foo.data() + foo.dimensionality. I think only the MultiArray concept would need further capabilities for a linear walk independent of the dimensionality (as .data() isn't part of the concept). A "position-dependent" behavior can be achieved (atop multi_array and multi_array_ref) by taking the distance between foo.data() and the current iterator and then playing some games with foo.strides() or std::distance(). I believe the same can be achieved atop the pure MultiArray concept using foo.origin(). No need to extend the concept (though the required couple of lines would be frustratingly lengthy). Personally, I've found that writing three overrides of most methods is sufficient to take advantage of the contiguous storage guaranteed by boost::multi_array and boost::multi_array_ref while allowing the general MultiArray cases to work. Aside from runtime dimension selection, what does 'box' add beyond the current extent_gen-based capabilities. (Compile-time) dimension-independent code can be written by templating on the MultiArray NumDims parameter (http://agentzlerich.blogspot.com/2010/01/providing-fill-and-foreach-algorith... is an example). Runtime-decisions about the dimensionality of a MultiArray definitely are interesting but I'm unimaginative and can't think of the use case...
The second way to use the candidate library is as an alternative to Boost.MultiArray of dynamically chosen (execution time) dimensionality. Moreover, the elements do not necessarily need to be arranged in a multi-dimensional "box", as is the case in a MultiArray. For example, a "simplex" organization of the elements is already implemented.
I'm afraid I can't begin to speak intelligently about the simplex organization... I don't think I'm your target audience so take all of this with a grain of salt. I very much appreciate your attempts to further improve the MultiArray concept as I use it heavily in my own work. Also, just cause I find it interesting, ndarray is a fairly cool alternative (https://code.google.com/p/ndarray/). - Rhys

On 10/04/2011 22:16, Rhys Ulerich wrote:
I don't think I'm your target audience so take all of this with a grain of salt. I very much appreciate your attempts to further improve the MultiArray concept as I use it heavily in my own work. Also, just cause I find it interesting, ndarray is a fairly cool alternative (https://code.google.com/p/ndarray/).
You might also be interested in how NT2 does it with its table class. It mimics the matlab interface (which numpy also does), so you can do funny things to get partial views etc. Storage order of each dimension, size of each dimension, starting index of each dimension can all be specified as template options, a size of -1 meaning a runtime size. It can be resized at runtime, and number of dimensions can be reduced, since it treats tables with N-1 dimensions the same as tables of N dimensions with the last dimension being of size 1.

Sorry for the slow reply...
Does the functionality work atop the MultiArray concept or atop boost::multi_array and boost::multi_array_ref?
Most of the proposed code works "by itself" (without need for MultiArray and/or boost::multi_array and related). Some constructors have the option of receiving a model of the MultiArray concept in order to be "compatible" with it. The iterator I was talking about in the earlier message is definitely made to work with the MultiArray concept. Finally, for convenience, some "utilities" directly refer to boost::multi_array when defining objects that are likely to come up in the application. (Note that what I propose are new concepts, not changes to already existing ones.)
One can linearly walk the contents of the latter using iterators foo.data() and foo.data() + foo.dimensionality. I think only the MultiArray concept would need further capabilities for a linear walk independent of the dimensionality (as .data() isn't part of the concept).
You probably meant foo.data() and foo.data() + foo.num_elements(). Yes, it is true that you can visit all the elements that way. In fact, this is the first example in the file sandbox/index_list/libs/index_list/examples.cpp .
A "position-dependent" behavior can be achieved (atop multi_array and multi_array_ref) by taking the distance between foo.data() and the current iterator and then playing some games with foo.strides() or std::distance(). I believe the same can be achieved atop the pure MultiArray concept using foo.origin(). No need to extend the concept (though the required couple of lines would be frustratingly lengthy).
My "personal library" that is the ancestor of the current IndexList was initially composed of those "frustratingly lengthy" couples of lines. With time, I added functionalities. Since implementing simplex_domain was "non-trivial", I thought that others could be interested: I cleaned up the code and sent it to the sandbox before posting the previous message. The "Domain" concept defined in the IndexList library generalizes the conversion you are talking about. Passing a number in the range [0,num_elements()-1] to the "expand()" member of a Domain returns a collection of indices, and passing that collection of indices to its "reduce()" member will return the original number. Given any collection of indices, a Domain's "is_valid()" member will return true if this collection could have been obtained from a call to "expand(i)" using "i" in range [0,num_elements()-1], and false otherwise. For box_domain, "reduce()" reproduce the same algorithm Boost.MultiArray uses to locate an element in memory (using shape() and index_bases()), "expand()" performs the opposite task as you earlier proposed and "is_valid()" simply compare each element of the provided collection of indices to index_bases and index_bases + shape.
Aside from runtime dimension selection, what does 'box' add beyond the current extent_gen-based capabilities. (Compile-time) dimension-independent code can be written by templating on the MultiArray NumDims parameter ( http://agentzlerich.blogspot.com/2010/01/providing-fill-and-foreach-algorith... is an example).
Excluding runtime dimensionality, the sole advantages I see for data organized in a "box" way are: - it facilitate coding concerning the current position using "expand()", - it lighten the treatment of "boundary conditions" using "is_valid()" (By boundary conditions, I refer to the special treatment that is sometime required, in the form of "ifs", around the edges of the MultiArray.) and - for the cases where, although it is ok to store the data using a "box" organization, the same data has to be accessed in a different way that is not trivial to implement (The access could then be made using a different domain than a box_domain.).
Runtime-decisions about the dimensionality of a MultiArray definitely are interesting but I'm unimaginative and can't think of the use case...
For me it is quite common, but I guess scientific computing is the special case... From my experience, runtime dimensionality is mostly required when you have to read the data from a file of unknown dimensionality and when "main(int argc, char *argv[])" receives arguments that affects the dimensionality.
I'm afraid I can't begin to speak intelligently about the simplex organization...
In two dimensions, a simplex_domain could have the following valid elements: (0,0), (0,1), (1,0), (0,2), (1,1), (2,0), (0,3), (1,2), (2,1), (3,0). In other words, the triangle contained within the box limited by (0,0) and (3,3), with the further constrain that the sum of all the indices does not exceed 3. Note that one could simply use a multi_array(boost::extents[4][4]) and discard the unused elements. For the simple example shown, proceeding that way could probably be an ok solution, since there would only be 6 unused elements out of a total of 16 elements in the MultiArray. For larger domains in two dimensional space, the limiting behaviour is that 50% of the elements will have to be unused. However, if you have to represent data that follow such a simplex topology in high dimensional space, the difference grows much larger between the required simplex and the smallest box in which this simplex "fits in". This is covered in the example file.
I don't think I'm your target audience so take all of this with a grain of salt. I very much appreciate your attempts to further improve the MultiArray concept as I use it heavily in my own work. Also, just cause I find it interesting, ndarray is a fairly cool alternative (https://code.google.com/p/ndarray/).
Thanks for the link!

On 04/11/11 15:15, Pierre-Andre Noel wrote: [snip]
Runtime-decisions about the dimensionality of a MultiArray definitely are interesting but I'm unimaginative and can't think of the use case...
For me it is quite common, but I guess scientific computing is the special case... From my experience, runtime dimensionality is mostly required when you have to read the data from a file of unknown dimensionality and when "main(int argc, char *argv[])" receives arguments that affects the dimensionality.
Last Novemeber: http://article.gmane.org/gmane.comp.lib.boost.user/63640 Alle had a use case for runtime dimensionality. In private emails between now and then, I did mention (on April 12) this thread and I think he's decided to abandon use of variants. I did propose something that did have dynamic dimensionality; however, he may want to consider this index_list instead since it's much further developed. -regards, Larry

On 04/15/11 11:32, Larry Evans wrote:
On 04/11/11 15:15, Pierre-Andre Noel wrote: [snip]
Runtime-decisions about the dimensionality of a MultiArray definitely are interesting but I'm unimaginative and can't think of the use case... [snip] Also, there's this quote:
One particularly important place where run-time specified dims is needed is for supporting a scripting interface. from: http://article.gmane.org/gmane.comp.lib.boost.devel/44640 which describes another use case. -Larry

On 04/11/11 15:15, Pierre-Andre Noel wrote: [snip]
The "Domain" concept defined in the IndexList library generalizes the conversion you are talking about. Passing a number in the range [0,num_elements()-1] to the "expand()" member of a Domain returns a collection of indices, and passing that collection of indices to its "reduce()" member will return the original number. Given any collection of indices, a Domain's "is_valid()" member will return true if this
collection
could have been obtained from a call to "expand(i)" using "i" in range [0,num_elements()-1], and false otherwise.
For box_domain, "reduce()" reproduce the same algorithm Boost.MultiArray uses to locate an element in memory (using shape() and index_bases()), "expand()" performs the opposite task as you earlier proposed and "is_valid()" simply compare each element of the provided collection of indices to index_bases and index_bases + shape.
This is similar to the encode and decode functions in apl. The difference is with the direction (increasing or decreasing w.r.t. value) of the strides and the fact that strides are already calculated in index_list; whereas in the encode/decode functions, they have to be calculated from scratch (from the shape vector). Here's the equivalent to reduce and expand using functions from the std library and variadic templates: -{{---cut here--- enum dirs //directions(used to flag order of processing an ordered sequence). { dir_fwd //forward direction. , dir_rev //reverse direction. }; ... int const my_dir /**@brief * Either 0 or 1. * If 1, then my_strides[i]/my_strides[i+1] > 0 * Otherwise, my_strides[i+1]/my_strides[i] > 0 */ ; typedef unsigned index_t ; typedef std::vector<index_t> strides_t ; strides_t my_strides /**@brief * The strides for each axis in the array. */ ; ... template < typename InpIter , typename OutIter > OutIter init_strides ( InpIter sizes_beg , InpIter sizes_end , OutIter strides ) /**@brief * Helper function for int_strides( dirs, Sizes...) */ { *strides=1; return std::partial_sum ( sizes_beg , sizes_end , ++strides , std::multiplies<typename InpIter::value_type>() ); } template < typename... Sizes > index_t init_strides ( dirs a_dir , Sizes... a_size ) /**@brief * Calculates strides of the array with shape, a_size... * If(a_dir==dir_fwd) then strides increase with index; * otherwise, they decrease with index. * Returns the size of the whole array, IOW, the * produce of a_size... */ { strides_t const sizes({a_size...}); index_t result; if(a_dir==dir_fwd) { auto it_v=init_strides ( sizes.begin() , sizes.end() , my_strides.begin() ); result=*(--it_v); } else { auto it_v=init_strides ( sizes.rbegin() , sizes.rend() , my_strides.rbegin() ); result=*(--it_v); } std::cout<<"result="<<result<<"\n"; return result; } index_t rank()const { return my_strides.size()-1; } ... template < typename... Index > index_t offset_at_indices ( Index... a_index )const /**@brief * The offset of element in my_data * corresponding to indices, a_index... */ { strides_t const indices({a_index...}); index_t const offset = std::inner_product ( indices.begin() , indices.end() , my_strides.begin()+my_dir , index_t(0) ); return offset; } template < typename InpIter , typename OutIter > void indices_at_offset ( index_t a_offset , InpIter strides_beg , InpIter strides_end , OutIter indices )const /**@brief * Helper function for unary indices_at_offset(see below). */ { std::transform ( strides_beg , strides_end , indices , [&a_offset](index_t stride) { index_t index=a_offset/stride; a_offset-=index*stride; return index; } ); } std::vector<index_t> indices_at_offset ( index_t offset )const /**@brief * Inverse of offset_at_indices */ { std::vector<index_t> indices(rank()); if(my_dir==dir_fwd) { indices_at_offset ( offset , my_strides.rbegin()+1 , my_strides.rend() , indices.rbegin() ); } else { indices_at_offset ( offset , my_strides.begin()+1 , my_strides.end() , indices.begin() ); } return indices; } -}}---cut here--- In the above, indices_at_offset corresonds to index_list expand, and offset_at_indices corresponds to index_list reduce.
Aside from runtime dimension selection, what does 'box' add beyond the current extent_gen-based capabilities. (Compile-time) dimension-independent code can be written by templating on the MultiArray NumDims parameter (
http://agentzlerich.blogspot.com/2010/01/providing-fill-and-foreach-algorith...
is an example).
Excluding runtime dimensionality, the sole advantages I see for data organized in a "box" way are: - it facilitate coding concerning the current position using "expand()", - it lighten the treatment of "boundary conditions" using "is_valid()" (By boundary conditions, I refer to the special treatment that is sometime required, in the form of "ifs", around the edges of the MultiArray.) and - for the cases where, although it is ok to store the data using a "box" organization, the same data has to be accessed in a different way that is not trivial to implement (The access could then be made using a different domain than a box_domain.).
I didn't notice any operations for doing transforms. Methods for doing transforms, as well as all the stride calculations as well as many other operations are described in the book: http://web.engr.oregonstate.edu/~budd/Books/aplc/ You might find it useful. AFAICT, one advantage that an explict dimension, which multi_array has, over this dynamic dimension is that it allows compile-time diagnosis of invalid number of indices. For example, if you decided to supply an operator[](unsigned Index), then the type of the result would have 1 less dimension. For example, with a multi_array<T,N>, the result would be Multi_array<T,N-1>, if N>1, or T& if N==1. How would you handle this with index_list? [snip] -regards, Larry

On 04/22/11 15:01, Larry Evans wrote:
On 04/11/11 15:15, Pierre-Andre Noel wrote: [snip]
The "Domain" concept defined in the IndexList library generalizes the conversion you are talking about. Passing a number in the range [0,num_elements()-1] to the "expand()" member of a Domain returns a collection of indices, and passing that collection of indices to its "reduce()" member will return the original number. Given any collection of indices, a Domain's "is_valid()" member will return true if this
collection
could have been obtained from a call to "expand(i)" using "i" in range [0,num_elements()-1], and false otherwise.
For box_domain, "reduce()" reproduce the same algorithm Boost.MultiArray uses to locate an element in memory (using shape() and index_bases()), "expand()" performs the opposite task as you earlier proposed and "is_valid()" simply compare each element of the provided collection of indices to index_bases and index_bases + shape.
This is similar to the encode and decode functions in apl. The difference is with the direction (increasing or decreasing w.r.t. value) of the strides and the fact that strides are already calculated in index_list; whereas in the encode/decode functions, they have to be calculated from scratch (from the shape vector).
Here's the equivalent to reduce and expand using functions from the std library and variadic templates:
-{{---cut here--- [snip] -}}---cut here--- This is now uploaded here:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/array_dyn/
In the above, indices_at_offset corresonds to index_list expand, and offset_at_indices corresponds to index_list reduce.
Aside from runtime dimension selection, what does 'box' add beyond the current extent_gen-based capabilities. (Compile-time) dimension-independent code can be written by templating on the MultiArray NumDims parameter (
http://agentzlerich.blogspot.com/2010/01/providing-fill-and-foreach-algorith...
is an example).
Excluding runtime dimensionality, the sole advantages I see for data organized in a "box" way are: - it facilitate coding concerning the current position using "expand()", - it lighten the treatment of "boundary conditions" using "is_valid()" (By boundary conditions, I refer to the special treatment that is sometime required, in the form of "ifs", around the edges of the MultiArray.) and - for the cases where, although it is ok to store the data using a "box" organization, the same data has to be accessed in a different way that is not trivial to implement (The access could then be made using a different domain than a box_domain.).
I didn't notice any operations for doing transforms. Methods for doing transforms, as well as all the stride calculations as well as many other operations are described in the book:
http://web.engr.oregonstate.edu/~budd/Books/aplc/
You might find it useful.
On page 82 of the above reference, there's this: the value A[a_1;...;a_n] will be found at offset in the underlying subexpression given by: offset = alpha + a_1*gamma_1 + ... + a_n*gamm_n for some values of alpha and gamma_i. ... Given this uniform representation, it is perhaps not surprising that structural functions can be composed. ... For structural functions, the medium for composition is an object called the 'stepper'. Each node in the parse tree that represents a structural function has a stepper associated with it. The stepper is characterized by three vectors, each of size 'n' where 'n' is the rank of the result. q_i The dimension of the result that the i-th dimension of the current node corresponds to. t_i The index along the i_th coordinate of the current node that contributes to the initial element in the result ravel order. d_i The direction to move along the i_the dimension in in order to arive at the next element in the result along the q_i-th coordinate. Or, put in another way: q_i is modified by transpose. For example, initially: q_i = i but after a transpose, the resulting values, q_i', are reversed: q_i' = q_(n-(i-1) as shown on p. 84 of the reference. t_i has something to do with initial offsets. IOW, alpha = t_1*stride_1+...+t_n*stride_n is the alpha mentioned earlier. Initially, all t_i's are 0, so that the initial alpha is also 0. After a drop of, say 2 on coordinate 1, then: t_1=2 as indicated on p. 85 of the reference, and the alpha becomes 2*stride_2. d_i indicates whether: A[a_1;...;a_i...;an] is stored after (d_i=+1) or before(d_i=-1) A[a_1;...a_i+1...;an] and I think this is related to either your box_domain's ordering_ or ascending_. For example, after coordinate i is reversed, then the resulting values, t_i' and d_i' are: t_i'= s_i-t_i-1 d_i'= -d_i where s_i is the initial size in the i_th coordinate (or dimension, in your terms). These d_i's and stride_i's are used to calculate the gamma_i's mentioned earlier as shown on p. 86 of the reference. The code I mentioned earlier, in array_dyn directory in box_domain.hpp, has none of q_i, t_i or d_i, but they could be provided and encapsulated in some sort of stepper object that the reference mentions if there's anyone interested. [snip] -regards, Larry

On 04/30/11 10:51, Larry Evans wrote: [snip]
Here's the equivalent to reduce and expand using functions from the std library and variadic templates:
-{{---cut here--- [snip] -}}---cut here--- This is now uploaded here:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/array_dyn/
[snip]
The code I mentioned earlier, in array_dyn directory in box_domain.hpp, has none of q_i, t_i or d_i, but they could be provided and encapsulated in some sort of stepper object that the reference mentions if there's anyone interested. [snip]
I knew I had mentioned the Budd code before, but I'd forgotten the name, 'stepper'. Searching for 'stepper' showed Budd's methods were mentioned during the multi_array review as shown by: http://article.gmane.org/gmane.comp.lib.boost.devel/59902/match=stepper -Larry

Thanks Larry for the suggestions. Although I will not prioritize this project (due both to a lack of time from my part and to a lack of general interest from the community), I definitely want to return working on it someday. When this moment will come, the book you suggested may save me from re-inventing the wheel... On Sat, Apr 30, 2011 at 12:24 PM, Larry Evans <cppljevans@suddenlink.net>wrote:
On 04/30/11 10:51, Larry Evans wrote: [snip]
Here's the equivalent to reduce and expand using functions from the std library and variadic templates:
-{{---cut here--- [snip] -}}---cut here--- This is now uploaded here:
http://svn.boost.org/svn/boost/sandbox/variadic_templates/sandbox/array_dyn/
[snip]
The code I mentioned earlier, in array_dyn directory in box_domain.hpp, has none of q_i, t_i or d_i, but they could be provided and encapsulated in some sort of stepper object that the reference mentions if there's anyone interested. [snip]
I knew I had mentioned the Budd code before, but I'd forgotten the name, 'stepper'. Searching for 'stepper' showed Budd's methods were mentioned during the multi_array review as shown by:
http://article.gmane.org/gmane.comp.lib.boost.devel/59902/match=stepper
-Larry
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (5)
-
Dave Abrahams
-
Larry Evans
-
Mathias Gaunard
-
Pierre-Andre Noel
-
Rhys Ulerich