Hi Andrzej,
On Tue, Sep 24, 2024 at 11:32 PM Andrzej Krzemienski
śr., 25 wrz 2024 o 04:27 Alfredo Correa via Boost
napisał(a): Second, Multi is not a numerical library specifically; it is about the logic and semantics of multidimensional arrays and containers, regardless of the element type. ...
See, this is exactly the problem. Why would I need something like that if I need to go to all the 3rd party libraries to actually use one efficiently?
The same reason some of use the standard library containers, or ranges, etc, even if they don't "do everything".
So, Artyom says that storage and element access alone is insufficient to warrant the existence of a library. Alfredo says something opposite.
If that is the point of the discussion and I didn’t realize it, that would be a very fair point from Artyom. Alfredo, what would help here is if you demonstrated that your library has
users, and have the users say why they chose it, given that they have to get the algorithms from elsewhere. Maybe you are such a user?
I don’t think it would help, but I want to be transparent and it is a good timing to do an assessment: Project users (list in the docs) https://github.com/QMCPACK/qmcpack (294 stars, 137 forks, 36 watch) https://github.com/llnl/inq (23 stars, 4 forks, 10 watch; 22 stars, 16 forks in gitlab) (disclaimer I am or I was involved in these projects above) another AFQMC (auxiliary field quantum monte carlo simulation code) at the Flatiron institute that is still a private repository, soon to be open source according to the authors. (not involved). other users: 6 stars, 2 watching in github repo 13 stars, 5 forks in gitlab repo about 100 issues by associated developers and I tracked about 2 or 3 issues openers that work at independent groups (quite advanced users if I have to tell). cpplang Slack #boost-multi channel: 18 members. (sorry if there is double counting, this is the best I can do) Most of the praise, from colleagues at least, is about the flexibility with allocations. In particular to separate allocations from array lifetime (i.e. memory pools) which solves 50% of the problems of value semantics with big arrays, especially in the GPU. Another was about how straightforward was to incrementally rewrite old code.
The rest of my post is "academic", as I do not have experience in the field.
Having only the storage and access abstraction would be preferred over a framework, if for your particular use case you have to employ two domains, like image-processing and generic AI/ML, and you want two sets of algorithms applied to the very same data. Then there may be no single framework that satisfies your need, and you may need to make two frameworks interoperate.
Yep, that is another problematic aspect of frameworks. In general the combinatorial explosion of tools necessary to make
Next, the analogy to STL alone is not good enough, I think. It is on you to demonstrate that the idea of generic programming also applies to *real life* usages of big multidimensional arrays. STL itself has been criticised that because it is generic, it cannot be optimized for particular types. ( https://www.youtube.com/watch?v=FJJTYQYB1JQ&ab_channel=CppCon)
Fair enough, the STL is opt-in, internal algorithms may be forwarded to STL, so far I didn’t have the need to use other libraries, except for GPUs. But I am open to discussing this more and improve if necessary. STL is not perfect. in fact I found a couple of obvious problem with how traits are used in STL and that maybe the library doesn’t *optimally* fit with some concepts in the STL regarding iterator categories). The main pain points, if you are curious, is that std::copy_n is not usable with broadcasted arrays (I opened a GCC lib bug with A. O’Dwyer regarding this) and that std::sort makes some overgeneralized assumptions about a trade off between copy (which allocates and can throw) plus n moves versus n swaps. Incidentally, this is related to the video link you sent, this is exactly the point I found, I believe that 1) multi iterators are iterators that can fall into a new category of iterators, for which I don't have a name yet, perhaps between random_iterator and bidirectional to handle broadcasted arrays. 2) std::sort (and other STL algorithms) is not customized enough, at least in the place I found where a "rotation-by-1" is implemented as a copy to a temporary and n-copies instead of n-swaps, even if n-swaps would be better for some elements types and "row" sizes (and be noexcept) (this is the infinite customization that Andrei talks about at https://youtu.be/FJJTYQYB1JQ?t=4475, thank you for reminding me of this talk). It would be fun, if not academic, exercise to see if I can apply Andrei's super-duper sort to array rows. I might discover something new as well. Notice that even though Andrei criticizes STL, he at least still uses iterators, which is enough for me. I would welcome any introspection mechanism that he would need for this. (There are other aspects that make the library useful, such as the flattening of arbritrary subarrays which effectively does a kind of loop-fusion and makes the use of STL even more attractive.)
In the context of big multi-dimensional arrays, we are talking about heavy computations. And maybe the data structures not optimized for specific use cases are simply disqualified from the outset.
yes
Please, treat it as a hint on how to communicate your ideas to people in this forum, in order to convince them.
Absolutely, I appreciate it.
Regards, &rzej;