[multi_array] Why the precondition on assignment?

Peter Barker

23 Feb 2009 23 Feb '09

11:35 a.m.

Hello, I wonder if anyone can give me the rationale behind boost::multi_index's assignment operator having the following precondition: std::equal(this->shape(),this->shape()+this->num_dimensions(),x.shape()); as mentioned in this document: http://www.boost.org/doc/libs/1_38_0/libs/multi_array/doc/reference.html ? I've got a multi_array as a data member of a class and would like to avoid having to create an assignment operator for my class do the resizing and assignment. Regards, Peter Barker

Show replies by date

Peter Barker

24 Feb 24 Feb

1:42 p.m.

On Mon, Feb 23, 2009 at 11:35 AM, Peter Barker <newbarker@gmail.com> wrote:

...

Hello,

I wonder if anyone can give me the rationale behind boost::multi_index's assignment operator having the following precondition:

That meant to say boost::multi_array like the subject does! I can't see the logic for requiring the assigned-to object to have the same shape as the RHS object - surely the array is being discarded and being made a copy of RHS? Anyone? Regards, Pete

Thomas Klimpel

6:46 p.m.

Peter Barker wrote:

...

...
I wonder if anyone can give me the rationale behind boost::multi_index's assignment operator having the following precondition:

That meant to say boost::multi_array like the subject does!

I can't see the logic for requiring the assigned-to object to have the same shape as the RHS object - surely the array is being discarded and being made a copy of RHS?

It had something to with concepts initially. It later turned more into a "communication issue". People seem to get very upset about the assignment operator of boost::multi_array, and write long emotional texts why it is completely wrong the way it is. But when Ronald Garcia suggested how he could change boost::multi_array, they didn't even care to give any feedback at all.

...

Anyone?

Perhaps I misremember things and should have searched the archives before answering you, but the "Anyone" suggests to me that you want to get at least any reaction/feedback to your question. Regards, Thomas

alfC

25 Feb 25 Feb

3:25 a.m.

On 24 Feb., 05:42, Peter Barker <newbar...@gmail.com> wrote:

...

...
I wonder if anyone can give me the rationale behind boost::multi_index's assignment operator having the following precondition: That meant to say boost::multi_array like the subject does!

I am not sure, but it seems that the idea was to avoid (surprise) reallocation by all means. If the LHS has to change size that means reallocation in the first place. Looking at the development history it seems that the author avoided reallocation by all means in the first versions of the library. Then he later added the .resize method (which does reallocation) but the assignment still follows the original prescription and that is why I believe. I might be wrong or there might be other reasons. I would like to know too, if that is the case. Note that, in multidimensional arrays, reallocation and/or copying is needed even if it shrinks in some directions (and not only at growing as one dimensional arrays) Regarding whether the current is the right design, I am not sure, on one hand it makes sense for a multidimensional arrays to forbid resize altogether, because in almost all cases, resize means reallocation and copying which has the order of cost of copy-constructing a new multiarray. On the other hand this restricted assignment doesn't follow the standard (expected) semantics. Regars, Alfredo

Peter Barker

9:52 a.m.

Thomas: You're right - I was hoping to get some feedback on it. It was bugging me because I thought I missed something obvious. Alfredo: Thanks for highlighting the conflicting design goals of avoiding reallocation and expected semantics. My vote would definitely be for the expected semantics because if the assignment is present in the program, then the programmer probably knows what they're doing. More importantly, it would avoid the maintenance burden of implementing operator= in classes that have multi_array instances as members. Thanks both! Regards, Pete

alfC

26 Feb 26 Feb

8:32 a.m.

...

My vote would definitely be for the expected semantics because if the assignment is present in the program, then the programmer probably knows what they're doing. More importantly, it would avoid the maintenance burden of implementing operator= in classes that have multi_array instances as members.

I agree, and specially because operator= in particular can not be redefined in C++ outside the class. I would honor semantics avobe all but not without some conceptual redesing first, (I wouldn't know WHAT to redesign though :)) , that comes with the unavoidable cost of surprise reallocation and a modification of current behavior of the library. It also poses some questions, should base indices be copied too? what if base indices are different? what if storage order is different? is the storage-order copied? Or suppose the opposite, that we want to forbid reallocation (and for simplicity even reshape) then it will turn out that multi_arrays of different shapes are effectivelly different "dynamic" types(e.g. an array of 10x10 and an array of 20x20 are different beasts although both are the same C++ type). I really don't know how "effective dynamic types" can be formalized in this context except that we could throw and exception like bad_cast or domain_error or size_mismatch or something when using arrays of different shapes. I hope someone goes around this problem in an elegant manner either with or without modifying the library; but yes then, the design goals of MultiArray should be stated more clearly in the first place. I admit that, in the few cases I had to do something about that, I just checked if I have to resize the destination array before assignment, but I was never happy with it. My next idea was to encapsulate the assigment in a function call that creates an object of class assign_helper that has a reference to the multi_array with an operator= that does this check and eventual resizes for me, so I can use it like this: smart_assign(A)=B; // takes care of eventual resize. Used instead of A=B (which may fail), (both A and B are of type multi_array<double, 2> for example.) that was my way arround not being able to redefine operator=; but I am not sure if it is a good idea yet and whether it is a real improvement over just something like 'smart_assign(A, B);'; Regards, Alfredo

Joel Falcou

9:56 a.m.

alfC a écrit :

...

smart_assign(A)=B; // takes care of eventual resize. Used instead of A=B (which may fail), (both A and B are of type multi_array<double, 2> for example.)

that was my way arround not being able to redefine operator=; but I am not sure if it is a good idea yet and whether it is a real improvement over just something like 'smart_assign(A, B);';

Tossing my few cents as I dealt with such shenanigans a lot of time. In my own multi-array like class, I faced this dilemma. What I did was using template boost::parameters to specify policy on allocation in the type and had the user make choice between throwing, static_asserting, silently reallocating, preserving order etc Why not using such policies ? I know it involves drastically changing multi_array interface but maybe it's worth the hassle. Just let the default parameters be the old multi_array semantic.

Peter Barker

10:40 a.m.

I've never tried to write a multi array class so perhaps I'm looking at the problem too simplistically. With std::vector (and probably all containers?) operator= will potentially reallocate, so I don't understand why boost::multi_array should be special. It's not really a *surprise* reallocation is it? operator= implies to me that you want to forget what's currently being held and assume a copy of the data that the other object has. I'd be *expecting* reallocations and existing indices/iterators to be invalidated. I've only just started using multi_array and I know there's a lot more to it than I'm aware of so apologies if my view on operator= is based more on ignorance than enlightenment! Regards, Pete

alfC

8:14 p.m.

...

With std::vector (and probably all containers?) operator= will potentially reallocate, so I don't understand why boost::multi_array should be special. It's not really a *surprise* reallocation is it?

first of all, linked list (~std::list), ordered trees (~std::set) and queues (~std::queue) don't have this problem at all, and they only need to allocate the storage for the new elements (or something of order 1); and they are designed with that in mind. going back to std::vector: What you say is correct for a dynamic (one dimensional) arrays but std::vector implements a more complex machinery to avoid reallocation in most cases. Something that multi_array doesn't do at all. In general, as std::vector grows (e.g. on push_back), it allocates more space that it needs. This is done automatically each time the vector *has* to grow, for example by duplicating the allocated (reserved) space; or it can be done manually. That is why all these methods are defined for std::vector reserve() capacity() which are different from resize() size() because of this trickery, reallocation in std::vector happens much less ofter than you may think. The price is extra storage (or optionally more manual control of reserved space.) Going back to your example: If assigned and assignee std::vector are of the "same order of size" then reallocation is unlikely, or can be amortized the first assignment.

...

operator= implies to me that you want to forget what's currently being held and assume a copy of the data that the other object has. Don't get me wrong, I agree with you. I am just pointing at the inconsistency in the design of multi_array but also trying to understand what is the origin of the problem and think of possible ways around them.

Now that you mentionad the example of std::vector. I am wondering whether such manual control of multi_array reserved space is THE solution. something like A.reserve({{shape1,shape2,shape3}}); // or just reserve (shape1*shape2*shape3) can the compromise solution for everyone. This can follow elegantly the design of std::vector at least partially. Ronald? (I would say automatic growing is a bad idea for multi_arrays, but manually reserving space can't hurt). Even for the solution I gave in my previous post I would need something like reserve because resize actually does copies and/or constructs element, and I don't need that because the elements will be overwritten anyway on the assignment. That would be the my first application of reserve. Regards, Alfredo

Peter Barker

27 Feb 27 Feb

1:49 p.m.

On Thu, Feb 26, 2009 at 8:14 PM, alfC <alfredo.correa@gmail.com> wrote:

...

Now that you mentionad the example of std::vector. I am wondering whether such manual control of multi_array reserved space is THE solution. something like

A.reserve({{shape1,shape2,shape3}}); // or just reserve (shape1*shape2*shape3)

Would this still make it necessary to write an operator= in a class that has a multi_array as a member? That's the main thing I'd like to avoid as it introduces a maintenance burden to ensure all the other members are copied. Because of that I think multi_array::operator= must be the exception to a strategy on avoiding reallocations. If the existing space can be used, then all well and good - have that optimisation. Thanks for explaining a bit more about multi_array and comparison of it with std::vector. Regards, Pete

alfC

28 Feb 28 Feb

12:42 a.m.

...

...
something like A.reserve({{shape1,shape2,shape3}}); // or just reserve (shape1*shape2*shape3)

Would this still make it necessary to write an operator= in a class that has a multi_array as a member?

Yes you will still need to write such operator, but at least it *can* be done efficiently (i.e. without useless copying that is performed if you resize (with A.resize()) the matrix first). I proposed and I had been thinking for a long time in a "reserve" method keyword for two reasons: * First, because I think that, with the current design, it is the only way to implement a resize-and-assign efficiently, then you can wrap this into a new object that has your desired semantics (I propose you to call it small_multi_array for example). * Second, for some numerical libraries I sometimes need to allocate extra storage beyond the end of the multi_array!! And it is not an esoteric numerical library, it is the famous FFTW3, which for the MPI version needs to allocate some more space than the one needed for the multidimensional array because of the algorithmic requirements. http://www.fftw.org/fftw3.3alpha_doc/Simple-MPI-example.html#Simple-MPI-exam.... My current approach is very dirty, I have an object that has a std::vector v which is only used to "reserve" enough space and a multi_array_ref with proper dimensionality and shape which "points" to &v[0]. In some situations (because of needs of FFTW3) the size (or reserved space) of the vector is larger than the one needed to reference all the array indexes. In this way I have total control of the allocated memory of the multi_array (which is not a multi_array anymore but a multi_array_ref). I have to do that just because there is no "reserve". As you see you are not the only one having to do dirty tricks with multi_array. At this point you may ask, why using multi_array at all? well, I still find very convenient other features of the library like straightforward indexing, index bases, strides, subarrays and arrayviews.

...

That's the main thing I'd like to avoid as it introduces a maintenance burden to ensure all the other members are copied.

Yeamm, sorry, it doesn't solve that problem. But at least with 'reserve' it *can* be solved. Let's see, lets try to find a solution at a higher level. Programs that deal with arrays, for example Matlab don't have this problem on assignment (op=) because they use copy-on-write for arrays in the first place. Which means that nothing is reallocated or copied on calling operator= but then when *at least one* element is modified the reallocation and copy happens together (like with the current resize ()! ). BTW, any body know what is the underlying Fortran strategy? Maybe we should stop thinking about "improving" the MultiArray and taking it as given, and start thinking on adapting it nicely (with a small layer of code on top of MultiArray) into a shared/copy-on-write type pointer that resembles as much as possible the multi_array. It is not that I know what to do exactly, I am trying to think out loud (and hoping the smart Boost developers to hear). Going back to your specific problem: If the arrays you are handling are big, didn't you think of keeping the multi_array in a shared_ptr (or a copy-on-write sort of thing) that is a member of your class. If the arrays are small and you can afford reallocation and spurious copies then wrap the multi_array in a small_multi_array class with the expected semantics.

...

Because of that I think multi_array::operator= must be the exception to a strategy on avoiding reallocations. If the existing space can be used, then all well and good - have that optimisation.

Given the current design: yes, I agree. But also remember that MultiArray is not only about multi_array, there are many other classes in the library where operator= still will work with the "restricted" semantics; for example, subarrays can still be assigned but the sizes have to match. For example multi_array<double, 2> A(extents[5][5]); multi_array<double, 1> B(extents[5]); multi_array<double, 1> C(extents[4]); ...currently you can do: A[3]=B; // A[3] is of type subarray but you can't do: A[3]=C; // asserts false in the same way as mismatched multi_arrays Should we complain about that too because it is an operator= call that fails in some cases? Alfredo

alfC

26 Feb 26 Feb

8:21 p.m.

...

In my own multi-array like class, I faced this dilemma. What I did was using template boost::parameters to specify policy on allocation in the type and had the user make choice between throwing, static_asserting, silently reallocating, preserving order etc

can you illustrate a little bit your design? Is the policy part of the state of the class? Is your multi_array class written on top of Boost.MultiArray. If not did that require small amount of code or basically another gargantuan sized library? Didn't you have problems with all the other derived type like subarrays and array_views, or you just didn't need them.

...

Why not using such policies ? I know it involves drastically changing multi_array interface but maybe it's worth the hassle.

why the "interface changes"? (except for the a redefined operator= I guess) Thank you, Alfredo

Joel Falcou

8:43 p.m.

alfC a écrit :

...

can you illustrate a little bit your design? Is the policy part of the state of the class?

Sample code for example : // 3D matrix with no-realloc semantic and base index of [1 1 1] matrix< float, settings(3d_, no_realloc, base_index<1,1,1>)> m( ofSize(4,4,4) ); Policy is part of the type signature and modify how the internals of the matrix works. The settings(...) is a shortcut to gather a large number of parameters. This type is then introspected internally using boost::parameters. ofSize(a0,...,an) is a function returning a nD extent object carrying the matrix dimensions size.

...

Is your multi_array class written on top of Boost.MultiArray. If not did that require small amount of code or basically another gargantuan sized library ?

It is "gargantuan" as it's basically a large compilation of tools for quickly turning Matlab code into C++ with fewest change possible. But I think the core of this thing can be extracted.

...

why the "interface changes"? (except for the a redefined operator= I guess I was thinking of the changes of the type signature sorry

-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

6020

Age (days ago)

6025

Last active (days ago)

List overview

Download

12 comments

4 participants

participants (4)

alfC
Joel Falcou
Peter Barker
Thomas Klimpel