Re: [boost] [Multi] Proposal

25 Sep 2024

      Hi Andrzej,

On Tue, Sep 24, 2024 at 11:32 PM Andrzej Krzemienski <akrzemi1@gmail.com>
wrote:
...
śr., 25 wrz 2024 o 04:27 Alfredo Correa via Boost <boost@lists.boost.org>
napisał(a):
...
...
...
Second, Multi is not a numerical library specifically; it is about the
logic and semantics of multidimensional arrays and containers,
regardless
of the element type.
...
See, this is exactly the problem. Why would I need something like that
if I need to go to all the 3rd party libraries to actually use one
efficiently?
The same reason some of use the standard library containers, or ranges,
etc, even if they don't "do everything".
So, Artyom says that storage and element access alone is insufficient to
warrant the existence of a library. Alfredo says something opposite.
If that is the point of the discussion and I didn’t realize it, that would
be a very fair point from Artyom.

Alfredo, what would help here is if you demonstrated that your library has
...
users, and have the users say why they chose it, given that they have to
get the algorithms from elsewhere. Maybe you are such a user?
I don’t think it would help, but I want to be transparent and it is a good
timing to do an assessment:

Project users (list in the docs)

https://github.com/QMCPACK/qmcpack (294 stars, 137 forks, 36 watch)

https://github.com/llnl/inq (23 stars, 4 forks, 10 watch; 22 stars, 16
forks in gitlab)

(disclaimer I am or I was involved in these projects above)

another AFQMC (auxiliary field quantum monte carlo simulation code) at the
Flatiron institute that is still a private repository, soon to be open
source according to the authors. (not involved).

other users:

6 stars, 2 watching in github repo
13 stars, 5 forks in gitlab repo
about 100 issues by associated developers and
I tracked about 2 or 3 issues openers that work at independent groups
(quite advanced users if I have to tell).

cpplang Slack #boost-multi channel: 18 members.

(sorry if there is double counting, this is the best I can do)

Most of the praise, from colleagues at least, is about the flexibility with
allocations.
In particular to separate allocations from array lifetime (i.e. memory
pools) which solves 50% of the problems of value semantics with big arrays,
especially in the GPU.
Another was about how straightforward was to incrementally rewrite old code.
...
The rest of my post is "academic", as I do not have experience in the
field.
Having only the storage and access abstraction would be preferred over a
framework, if for your particular use case you have to employ two domains,
like image-processing and generic AI/ML, and you want two sets of
algorithms applied to the very same data. Then there may be no single
framework that satisfies your need, and you may need to make two frameworks
interoperate.
Yep, that is another problematic aspect of frameworks. In general the
combinatorial explosion of tools necessary to make
...
Next, the analogy to STL alone is not good enough, I think. It is on you
to demonstrate that the idea of generic programming also applies to *real
life* usages of big multidimensional arrays. STL itself has been criticised
that because it is generic, it cannot be optimized for particular types. (
https://www.youtube.com/watch?v=FJJTYQYB1JQ&ab_channel=CppCon)
Fair enough, the STL is opt-in, internal algorithms may be forwarded to
STL, so far I didn’t have the need to use other libraries, except for GPUs.
But I am open to discussing this more and improve if necessary.

STL is not perfect. in fact I found a couple of obvious problem with how
traits are used in STL and that maybe the library doesn’t *optimally* fit
with some concepts in the STL regarding iterator categories). The main pain
points, if you are curious, is that std::copy_n is not usable with
broadcasted arrays (I opened a GCC lib bug with A. O’Dwyer regarding this)
and that std::sort makes some overgeneralized assumptions about a trade off
between copy (which allocates and can throw) plus n moves versus n swaps.

Incidentally, this is related to the video link you sent, this is exactly
the point I found, I believe that 1) multi iterators are iterators that can
fall into a new category of iterators, for which I don't have a name yet,
perhaps between random_iterator and bidirectional to handle broadcasted
arrays. 2) std::sort (and other STL algorithms) is not customized enough,
at least in the place I found where a "rotation-by-1" is implemented as a
copy to a temporary and n-copies instead of n-swaps, even if n-swaps would
be better for some elements types and "row" sizes (and be noexcept) (this
is the infinite customization that Andrei talks about at
https://youtu.be/FJJTYQYB1JQ?t=4475, thank you for reminding me of this
talk).

It would be fun, if not academic, exercise to see if I can apply Andrei's
super-duper sort to array rows.
I might discover something new as well.
Notice that even though Andrei criticizes STL, he at least still uses
iterators, which is enough for me.
I would welcome any introspection mechanism that he would need for this.

(There are other aspects that make the library useful, such as the
flattening of arbritrary subarrays which effectively does a kind of
loop-fusion and makes the use of STL even more attractive.)
...
In the context of big multi-dimensional arrays, we are talking about heavy
computations. And maybe the data structures not optimized for specific use
cases are simply disqualified from the outset.
yes
...
Please, treat it as a hint on how to communicate your ideas to people in
this forum, in order to convince them.
Absolutely, I appreciate it.
...
Regards,
&rzej;