New subject: [Multi] Proposal

25 Sep 2024

      ...
Date: Sat, 21 Sep 2024 20:10:25 +0300
From: Artyom Beilis <artyom.beilis@gmail.com>
...
I do scientific computing with large arrays, and nobody uses OpenCV.
It would be fair if you had said that it is the de facto standard for
image processing, which I don't do.
Ok... if you haven't used does not mean that it isn't very common
computing library that many developers/designers go by default.
Maybe we don't agree what numeric means, or we just work in different
environments.
Anyway, I would agree that OpenCV is unquestionably a standard for image
processing.
...
...
Second, Multi is not a numerical library specifically; it is about the
logic and semantics of multidimensional arrays and containers, regardless
of the element type.
...
See, this is exactly the problem. Why would I need something like that
if I need to go to all the 3rd party libraries to actually use one
efficiently?
The same reason some of use the standard library containers, or ranges,
etc, even if they don't "do everything".
...
cv::Mat is numpy like NDArray with strides, windows, offsets (yes it
supports more than two dimensions). I myself used/written several
Tensor like objects and used (pytorch C++ tensor, dlprimitives
and other objects). It is nothing new, by all means it is the easiest
part of any numerical library.
I don't know how generically OpenCV can treat strides, windows, offsets,
etc.;
I trust you.

Again, this boils down to compare apples with oranges, frameworks vs
components.

I don't have experience with OpenCV, I have seen code using with OpenCV, it
seems it has a lot of features, you can load images, render windows, and do
some array computations.
It doesn't look like generic code to me, STL can not be used directly, for
example (which is fine).

I propose that, if you need to read images, and do the standard things that
openCV is build for, such image processing, and probably much beyond, by
all means use OpenCV.
And I propose that if you need to do generic programming with standard
algorithms, iterators, and some level of abstraction, use Multi, or
something else.

If you are not interested in the second (which is fine) then you are not
the right audience for the proposed library.
...
...
...
2. While actual ndarry handling is nice, it is basically a tip of an
iceberg. You need a decent set of high performance algo combined with
it.
I couldn't agree more.

The thing that I appreciate about generic programming and programming with
components is that it separates data structures and algorithms.
And if one does it right, the goal is no loss of efficiency.
Of course this requires a lot of work, specially for library developers.
...
...
There is a clear separation of concerns here.
The Multi library deals with the data structure, it uses when it needs
...
The Multi library doesn't provide algorithms, it uses the algorithms
to fulfill its semantics, the best *existing* algorithms it can in
awareness of the datastructure.
that are provided to it via different mechanisms.
But if you don't provide algorithms, maybe I'd better take a
library/framework that does.
Exactly, that is the appeal of frameworks.
Frameworks are great if they do all that you need and no less.

The moment you need to do something that the framework has not contemplated
you are totally on your own, without much help.

There are plenty of numpy-like arrays around. Usually they are called
...
tensors...
I agree, and almost all of them are frameworks.

I don't think anything works with existing STL algorithms, iterators.
If you are not interested in this, this is not for you.
...
...
...
I agree, promising all linear algebra is infinite work, like
reimplenting MATLAB or Mathematica, but BLAS has a finite number of
functions.
The philosophy of the BLAS-adaptor in particular (again, an optional
component) is to interface to what BLAS offers and not more.
It is more of a model how to interface a legacy library using the
features of Multi.
But that is exactly what make opencv useful and multi-array like a
fasade with emptiness behind.
This is like saying all the STL containers are just "emptiness behind".
In some sense it is true; you need algorithms, that exists separately.

I understand your feelings.
The emptiness that you feel, the "rest" is the algorithms and adaptors that
exists separately from the data structure.
There are excellent libraries for that, starting from STL.
...
...
I did breef look into implementation and it seems to be lacking of
...
vectorization (use of intrinsics/vector operators) or have I missed?
You missed that this is a generic, not specifically numerical, library.
But, you making numpy-like library...
You keep bringing up numpy.
Some users of the library see the analogy with numpy because of the easy of
use, and I appreciate that but I don't mention numerics or numpy in the
documentation, except in one or two places with very clear context.

If, for you, numpy implies numerics, then you are using the wrong analogy
and point of comparison.
...
otherwise you wouldn't be
interfacing cblas.
Would you feel better about it if I remove the BLAS adaptor?
This is a completely optional component.

This is to make the library more immediately useful without converting it
into a framework.
The interface with BLAS is very strictly to be have to use BLAS through the
Multi facilities in a functional (immutable) context that is friendly, for
example, to STL algorithms that take lambdas.

See, if you have been talking about multi-array as advanced
...
std::vector of generic objects... Ok - but you don't
"Advanced std::vector of generic objects" is a fair scope for the library,
thank you.
I would say it is *very* advanced version of std::vector, but this is just
an opinion I have.

I don't understand exactly what is your complain, what do you mean with
"but you don't".
This is what I say in the introduction:

"This library offers array containers and subarrays in arbitrary dimensions
with well-behaved value semantics,
featuring logical access recursively across dimensions and to elements
through indices and iterators."

I don't know where you read numerics in there.

you direct it to the numeric computations.
...
I don't direct it to the numerical computation; I clearly say in the
introduction that this is not a numerical library.
I use build-in types (preferably ints) in the examples for conciseness.
I just say that if you want to use it in numerical context, it still should
be ok for that.

The reason I don't and I won't explicitly rule out numeric applications in
the documentation is because in my opinion there are no obstacles to put to
use the library in numerical context.
The library is in fact used in numerical context by other projects.
...
...
For example, then dealing with GPU or OpenMP arrays, the library uses
The other thing to take into account is that
vectorization/parallelization is still provided by the external algorithms
the library uses internally.
thrust algorithms if they are available, which are parallel.
Just for the record there are two levels of parallelization on CPU
level: 1 thread based parallelization, 2nd SIMD level parallelization
like SSE/AVX2/Neon - where you load vectors of 16, 32 bytes of data
and process them together in a single instruction.
These can increase the performance significantly.
Yes, I gave the example about thread paralellization, through execution
policies, because it is cleaner.

SIMD parallelization is more difficult to apply in general because it
depends on data to be continuous in at least one dimension, which is only
realized by very specific layouts
(this is the reason BLAS matrices always have at least one stride = 1).
This is not the general case, when you manipulate arrays dynamically.
But I agree it still can be applied with some effort.
I would do everything I can to offer the possibility of doing SIMD through
the library in combination with external algorithms and lift any obstacles
to it, if there are any.

Having said that, it is not a big concern for the library implementation
because 1) it is not numeric specifically, 2) semantic operations in the
library, such as assignment or construction do not involve computation.

However, if you have a clear idea how to enable SIMD for a particular
problem you are free to do it, once again, through specialized algorithms.
...
OpenCV isn't 2d only. It support n-D  tensors with views.
Ok, I stand corrected then.

@Artyom, Can you help me understand how it compare in these aspects that
are important for my library? I bet OpenCV will do very well in the
comparison. It is a simple checkbox list.

- external Deps ?
- Arbritary number of dims (e.g. 11 dimensions)
- Non-owning view of data   (e.g. manipulate view memory provided by
others)
- Compile-time dim size
- Array values (owning data)  (e.g. can I put arrays inside a std::list?)
- Value semantic (Regular)  (can I assign, swap, with expected Stepanov
regularity results)
- Move semantics (e.g.  will this copy data arr2 = std::move(arr1) ?)
- const-propagation semantics (e.g. Mat const arr; ... is arr really
read-only)
- Element initialization (e.g. can the arrays be initialized at declaration
e.g. Mat const arr = {1.0, 2.0, ...})
- References w/no-rebinding (e.g. can I name a subblock of an array, e.g.
`auto sub = subblock of arr`? does it have reference semantics (no copy)?)
- Element access (e.g. how easy is to access a speicif elements, e.g. in 4
dimensions `arr( 3, 4, 5, 6)`)
- Partial element access, (e.g. take n-th column or n-th row of a 2D array)
- Subarray views (e.g. generate a "view" 2D subblock of a 2D array)
- Subarray with lower dim (e.g. generate a "view" nD subblock of a mD
array, where n < m).
- Subarray w/well def layout (e.g. access the layout of a subblock, if
sunblocks can be referred to?)
- Recursive subarray (e.g. can sunblocks "views" of subblocks "view" be
temporaries)
- Custom Alloctors (e.g. thrust::device_allocator,
boost::interprocess::allocator)
- PMR Alloctors (e.g. use std::pmr::monotonic_memory_resource)
- Fancy pointers / references (e.g. use memory not represented by raw
pointers, e.g. thrust::device_pointer, boost::interprocess::offset_ptr)
- Stride-based Layout (e.g. supports strides layout, element, and can gives
this information to low level libraries)
- Fortran-ordering (e.g. left-index is the fast index in memory)
- Zig-zag / Tiles / Hilbert ordering / (e.g. fancy layouts beyond strides)
- Arbitrary layout (e.g. can data be laid out arbitrarily in memory in a
user-defined way, not strides, not zig-zag)
- Flattening of elements (e.g. any facilities to look at the elements in a
flatted way beyond simply giving a pointer to the first element, which will
not work for subblocks)
- Iterators (e.g. have the array, in any useful sense, .begin and .end?)
- Multidimensional iterators (cursors) (e.g. auto h = arr_subarray.home();
h gives access to elements but is light as a pointer)
- STL algorithms or Ranges (e.g. would it work with `std::sort`,
`std::copy`, `std::reduce`, `std::rotate`, or with any ranges algorithm)
- Compatibility with Boost (e.g. put arrays in Boost containers, use Boost
Serialization, Boost interprocess, Boost algorithms)
- Compatibility with Thrust or GPUs (e.g. can the array elements be in the
gpu, and use the array through its interface without segfaulting, or use
thrust::device_pointer memory)
- Used in production (e.g. major users or industries)
...
...
I consider that Eigen, OpenCV, Kokkos, PETSc are frameworks.
It reminds me of a comparison of Boost.Beast and a full scale
framework like CppCMS.
When I reviewed beast it was clear that it does not do 10% of what is
expected from something
to make an actually useful web application.
I am not an expert on Boost.Beast, but it looks like you are not the
typical audience for some of Boost libraries.
The key to me, although not exclusively, is the availability of generic
components and generic programming.
...
While it is nice to have an abstraction - if so either keep it basic
or go full way - you are stuck somewhere
in between std::vector ++ and something like OpenCV.
This is a fair assessment, if you see resemblances with OpenCV is welcomed
but accidental, since it is a library that I don't use.
Sorry if you are disappointed this library doesn't do (directly at least)
things that OpenCV does;
the library definitely can do other things that OpenCV can't (and you are
not interested in), and even if OpenCV can do them, it will do them with an
interface that I is not within the scope of the goals of my library.
...
...
I don't fill confident adding an OpenCV column because I don't have
experience with it, but feel free to help me adding a library and answering
the points of each row in the comparison table.
I suggest getting some experience with OpenCV. It is a very good
library that already implements what you have (also in a different
way)
I doubt OpenCV implements everything that Multi has.
But even if it does, it does it with a different interface that was a
priority for my design.

(To be fair, Multi doesn't implement everything that OpenCV does either,
for sure)
...
It ain't perfect by any means. But it works, well debugged, understood
and is a widely available library that does the job.
I don't doubt this.

I think it is good that both libraries are different, I don't need all the
bells and whistles that OpenCV has and it is undeniably a heavy dependency.
I don't need rendering images, loading images, or be optimized for element
types that OpenCV offers.
...
...
...
What is not clear to me is why should I use one over some existing
solution
like OpenCV?
- Because sometimes your elements are not numerical types.
Yeahhh... don't buy it. Sorry :-)
yeah, but I do need a 100x100 array of std::strings. and a 20x10 array of
std::tuples.
I would love to know if OpenCV can store those.
...
...
- Because sometimes you want to specify your element types as template
parameters not as OpenCV encoded types.
To make your code compilation slower and more horrible? Actually
OpenCV supports templated accessors.
This is the perennial discussion between header-only and pre-compiled
libraries.
I don't have anything to add, the idea is that you pay at compilation for
what you use.

Feel free to try the library in Godbolt to see the compilation times and
the machine code it produces.
It is a pity that OpenCV is not in Godbolt to compare both online.

Having said that, I would welcome any comparison in timing and usability
between the two libraries within or around the scope of what I am proposing.
...
...
- Because sometimes you want arbitrary dimensionality (e.g. 2D, 3D, 6D)
to be compile-time.
And why it isn't possible with OpenCV
I don't know, is it possible?
...
From the examples I see online, all OpenCV arrays regardless of
dimensionality have all the same type.
Maybe I am mistaken.
...
...
- Because sometimes you want to apply generic algorithms to your arrays
(STL, std::sort, std::rotate, std::ranges, boost::algorithms, serialization)
Yeah... good luck with that in numerical context. But Ok.
I use the library in numerical context, I have no problem with that:
https://gitlab.com/npneq/inq
https://github.com/QMCPACK/qmcpack
...
In OpenCV
you can just get several pointers and work with them
You can not have it both ways.
If you need a pointer and the sizes (and strides), Multi gives you access
for that too;
and then according to your definition, it would be equally powerful as
OpenCV.

https://gitlab.com/correaa/boost-multi#legacy-libraries-c-apis
...
...
- Because sometimes you want to implement function that are oblivious to
...
- Because sometimes you want to control allocations, use fancy pointers,
the actual dimension or your array (e.g. simultaneously view a 3D array of
elements as a 2D array of something, for abstraction).
parameterized memory strategies, polymorphic allocators,
OpenCV support custom allocations (actually something I exactly use
right now to monitor memory consumption)
That is great, can you point me to an example?
...
From what I quickly see online, openCV gives its own allocators.
I am not interested in non-standard allocators for this library.
It is not clear to me if you can use standard allocators and PMR allocator
in OpenCV.

Can it take allocators that do not return raw pointer types such as GPU?
and if not, what if you allocate a raw pointer with cudaMalloc, how much of
the library can you still use?
...
Once again - be careful what you wish for. Templated allocators are
huge headache to work with in comparison to run-time ones (in real -
non fancy template world that Boost Loves)
It is not a matter of what I wish.
The library is not a framework, allocation is something that the user of
the library brings in.
I don't choose it for them.
...
...
- Because sometimes you want precise value semantics
I'm not sure it is a great idea for huge arrays/matrices.  Virtually
ALL tensor libraries have reference semantics for a very good reason.
Sometimes you need it, sometimes you don't.
If you don't need, don't make copies, don't pass by value, that is the
right thing to do and allowing the option it is a priority of the library.
I prevent the library to make copies as much as I can, until the user
really means it.
...
...
- Because sometimes you want to work with subblocks of an array in the
same way to need you work with the whole array.
And this is exactly what view is for... And this is common for any
tensor library.
Great, I would feel bad if I didn't provide this fundamental feature.
The next question is whether arrays and subarrays/subblocks/views can be
treated in equal footing.
...
...
- Because sometimes you want to give guarantees in the presence of
exceptions (the library doesn't throw exceptions, to be clear).
Ok this is one of the points I want to discuss here in terms of
design, exceptions and broadcasting.
Lets take an example (numpy - but same works for torch::Tensor)
a=np.ones(5,10)
    b=np.ones(2,5,1)
    c=a+b
It would perform broadcasting to Shape = (2,5,10) - automatically. How
it can be done - you broadcast shapes
of a to 2,5,10 using strides (0,10,1) and broadast b to same shape
using strides (5,1,0) (I hope I have not mistaken in calcs)
Then you can run easily on shape 2,5,10 and do all calculations, fetching
etc.
Looking into your broadcasting docs - left me with the impression it
does not support anything like this.
What do you mean by "anything like this"?

The library doesn't make algebraic assumptions about the arrays of the
elements.
There is no binary addition + in the library, you can define your own if
you want, using well know algorithms or specializations of them.

Once you tell me what is your specification and implementation of '+' I can
tell you how to use broadcast for your case.
If it is a good example I could even add it to the documentation.

Note that + implies an algebra, I don't consider it part of the scope of
the library.
Broadcast instead is part of the library because I consider it a
non-algebraic but very powerful array manipulation feature.

Broadcast is implemented in the library within a certain design.
I think you are disapointed because the broadcast is not "automatic", which
is a different issue.
...
I read this:
...
First, broadcasted arrays are infinite in the broadcasted
dimension; iteration will never reach the end position, and calling
`.size()` is undefined behavior.
Why? You certainly can iterate over broadcasted dimension...
Yes, you can iterate, that is fine.

multi::array<int, 1> A1D = {1, 2, 3};
auto const& A2D = A1D.broadcast();
// A2D = { {1, 2, 3}, {1,2, 3}, {1,2, 3}, ....}
assert( A1D[0] == {1, 2, 3} );
assert( A1D[1] == {1, 2, 3} );
...

what is your question?
is your question what should A2D.size() return?
...
Also how do you handle incompatible broadcasting without exception -
for example if b in the example above was of shape (2,5,3) and not
(2,5,1)
The "raw" broadcast operation in the library is an operation that cannot
fail.
It is how the user of the broadcasted object uses it what can create
problems down the road if he doesn't know what he/she is doing.
The user may choose algorithms that throw when sizes are incompatible, or
just assert or just continue.
Who am I to choose for them?

(I do it in a certain way in the BLAS adaptor but that is a completely
different story).
...
Exceptions are useful. And claiming that you don't throw ones means
you don't use new...
Exceptions are great, that doesn't mean that every library should be
throwing exceptions.
And you are completely correct: the library doesn't use `new`.
(I couldn't, even if I wanted since the arrays should work in especial
memory too).
The user of the library provides the allocator, the library itself doesn't
throw.
At most, it is the allocation (or element assignment) what can throw and I
don't have control over that because it is an arbitrary element type and
the library will handle it well.
...
I highly doubt the practicality of the library in the current form.
While generic multi-array can be nice to have, but actually it is
stuck at being little bit more that MultiArray
but too far even from basic numpy.
Thank you for your assessment.

I think you don't agree with the scope of the library, which would be fine.
May be you also think that the scope of the library is more than what I
state, which is something I would like to clarify in some way.

l don't think this is a little more than MultiArray, but it depends on the
usage and how much you care about certain things.
(there is a column in the table that will help you evaluate the
differences).

Thank you,
Alfredo

Re: [boost] [Multi] Proposal

Alfredo Correa

Andrzej Krzemienski

Alfredo Correa

Artyom Beilis

tags

participants (3)