Date: Sat, 21 Sep 2024 20:10:25 +0300 From: Artyom Beilis
I do scientific computing with large arrays, and nobody uses OpenCV. It would be fair if you had said that it is the de facto standard for image processing, which I don't do.
Ok... if you haven't used does not mean that it isn't very common computing library that many developers/designers go by default.
Maybe we don't agree what numeric means, or we just work in different environments. Anyway, I would agree that OpenCV is unquestionably a standard for image processing.
Second, Multi is not a numerical library specifically; it is about the logic and semantics of multidimensional arrays and containers, regardless of the element type. ...
See, this is exactly the problem. Why would I need something like that if I need to go to all the 3rd party libraries to actually use one efficiently?
The same reason some of use the standard library containers, or ranges, etc, even if they don't "do everything".
cv::Mat is numpy like NDArray with strides, windows, offsets (yes it supports more than two dimensions). I myself used/written several Tensor like objects and used (pytorch C++ tensor, dlprimitives and other objects). It is nothing new, by all means it is the easiest part of any numerical library.
I don't know how generically OpenCV can treat strides, windows, offsets, etc.; I trust you. Again, this boils down to compare apples with oranges, frameworks vs components. I don't have experience with OpenCV, I have seen code using with OpenCV, it seems it has a lot of features, you can load images, render windows, and do some array computations. It doesn't look like generic code to me, STL can not be used directly, for example (which is fine). I propose that, if you need to read images, and do the standard things that openCV is build for, such image processing, and probably much beyond, by all means use OpenCV. And I propose that if you need to do generic programming with standard algorithms, iterators, and some level of abstraction, use Multi, or something else. If you are not interested in the second (which is fine) then you are not the right audience for the proposed library.
2. While actual ndarry handling is nice, it is basically a tip of an iceberg. You need a decent set of high performance algo combined with
it.
I couldn't agree more. The thing that I appreciate about generic programming and programming with components is that it separates data structures and algorithms. And if one does it right, the goal is no loss of efficiency. Of course this requires a lot of work, specially for library developers.
There is a clear separation of concerns here. The Multi library deals with the data structure, it uses when it needs
The Multi library doesn't provide algorithms, it uses the algorithms
to fulfill its semantics, the best *existing* algorithms it can in awareness of the datastructure. that are provided to it via different mechanisms.
But if you don't provide algorithms, maybe I'd better take a library/framework that does.
Exactly, that is the appeal of frameworks. Frameworks are great if they do all that you need and no less. The moment you need to do something that the framework has not contemplated you are totally on your own, without much help. There are plenty of numpy-like arrays around. Usually they are called
tensors...
I agree, and almost all of them are frameworks. I don't think anything works with existing STL algorithms, iterators. If you are not interested in this, this is not for you.
I agree, promising all linear algebra is infinite work, like reimplenting MATLAB or Mathematica, but BLAS has a finite number of functions. The philosophy of the BLAS-adaptor in particular (again, an optional component) is to interface to what BLAS offers and not more. It is more of a model how to interface a legacy library using the features of Multi.
But that is exactly what make opencv useful and multi-array like a fasade with emptiness behind.
This is like saying all the STL containers are just "emptiness behind". In some sense it is true; you need algorithms, that exists separately. I understand your feelings. The emptiness that you feel, the "rest" is the algorithms and adaptors that exists separately from the data structure. There are excellent libraries for that, starting from STL.
I did breef look into implementation and it seems to be lacking of
vectorization (use of intrinsics/vector operators) or have I missed?
You missed that this is a generic, not specifically numerical, library.
But, you making numpy-like library...
You keep bringing up numpy. Some users of the library see the analogy with numpy because of the easy of use, and I appreciate that but I don't mention numerics or numpy in the documentation, except in one or two places with very clear context. If, for you, numpy implies numerics, then you are using the wrong analogy and point of comparison.
otherwise you wouldn't be interfacing cblas.
Would you feel better about it if I remove the BLAS adaptor? This is a completely optional component. This is to make the library more immediately useful without converting it into a framework. The interface with BLAS is very strictly to be have to use BLAS through the Multi facilities in a functional (immutable) context that is friendly, for example, to STL algorithms that take lambdas. See, if you have been talking about multi-array as advanced
std::vector of generic objects... Ok - but you don't
"Advanced std::vector of generic objects" is a fair scope for the library, thank you. I would say it is *very* advanced version of std::vector, but this is just an opinion I have. I don't understand exactly what is your complain, what do you mean with "but you don't". This is what I say in the introduction: "This library offers array containers and subarrays in arbitrary dimensions with well-behaved value semantics, featuring logical access recursively across dimensions and to elements through indices and iterators." I don't know where you read numerics in there. you direct it to the numeric computations.
I don't direct it to the numerical computation; I clearly say in the introduction that this is not a numerical library. I use build-in types (preferably ints) in the examples for conciseness. I just say that if you want to use it in numerical context, it still should be ok for that. The reason I don't and I won't explicitly rule out numeric applications in the documentation is because in my opinion there are no obstacles to put to use the library in numerical context. The library is in fact used in numerical context by other projects.
For example, then dealing with GPU or OpenMP arrays, the library uses
The other thing to take into account is that vectorization/parallelization is still provided by the external algorithms the library uses internally. thrust algorithms if they are available, which are parallel.
Just for the record there are two levels of parallelization on CPU level: 1 thread based parallelization, 2nd SIMD level parallelization like SSE/AVX2/Neon - where you load vectors of 16, 32 bytes of data and process them together in a single instruction. These can increase the performance significantly.
Yes, I gave the example about thread paralellization, through execution policies, because it is cleaner. SIMD parallelization is more difficult to apply in general because it depends on data to be continuous in at least one dimension, which is only realized by very specific layouts (this is the reason BLAS matrices always have at least one stride = 1). This is not the general case, when you manipulate arrays dynamically. But I agree it still can be applied with some effort. I would do everything I can to offer the possibility of doing SIMD through the library in combination with external algorithms and lift any obstacles to it, if there are any. Having said that, it is not a big concern for the library implementation because 1) it is not numeric specifically, 2) semantic operations in the library, such as assignment or construction do not involve computation. However, if you have a clear idea how to enable SIMD for a particular problem you are free to do it, once again, through specialized algorithms.
OpenCV isn't 2d only. It support n-D tensors with views.
Ok, I stand corrected then. @Artyom, Can you help me understand how it compare in these aspects that are important for my library? I bet OpenCV will do very well in the comparison. It is a simple checkbox list. - external Deps ? - Arbritary number of dims (e.g. 11 dimensions) - Non-owning view of data (e.g. manipulate view memory provided by others) - Compile-time dim size - Array values (owning data) (e.g. can I put arrays inside a std::list?) - Value semantic (Regular) (can I assign, swap, with expected Stepanov regularity results) - Move semantics (e.g. will this copy data arr2 = std::move(arr1) ?) - const-propagation semantics (e.g. Mat const arr; ... is arr really read-only) - Element initialization (e.g. can the arrays be initialized at declaration e.g. Mat const arr = {1.0, 2.0, ...}) - References w/no-rebinding (e.g. can I name a subblock of an array, e.g. `auto sub = subblock of arr`? does it have reference semantics (no copy)?) - Element access (e.g. how easy is to access a speicif elements, e.g. in 4 dimensions `arr( 3, 4, 5, 6)`) - Partial element access, (e.g. take n-th column or n-th row of a 2D array) - Subarray views (e.g. generate a "view" 2D subblock of a 2D array) - Subarray with lower dim (e.g. generate a "view" nD subblock of a mD array, where n < m). - Subarray w/well def layout (e.g. access the layout of a subblock, if sunblocks can be referred to?) - Recursive subarray (e.g. can sunblocks "views" of subblocks "view" be temporaries) - Custom Alloctors (e.g. thrust::device_allocator, boost::interprocess::allocator) - PMR Alloctors (e.g. use std::pmr::monotonic_memory_resource) - Fancy pointers / references (e.g. use memory not represented by raw pointers, e.g. thrust::device_pointer, boost::interprocess::offset_ptr) - Stride-based Layout (e.g. supports strides layout, element, and can gives this information to low level libraries) - Fortran-ordering (e.g. left-index is the fast index in memory) - Zig-zag / Tiles / Hilbert ordering / (e.g. fancy layouts beyond strides) - Arbitrary layout (e.g. can data be laid out arbitrarily in memory in a user-defined way, not strides, not zig-zag) - Flattening of elements (e.g. any facilities to look at the elements in a flatted way beyond simply giving a pointer to the first element, which will not work for subblocks) - Iterators (e.g. have the array, in any useful sense, .begin and .end?) - Multidimensional iterators (cursors) (e.g. auto h = arr_subarray.home(); h gives access to elements but is light as a pointer) - STL algorithms or Ranges (e.g. would it work with `std::sort`, `std::copy`, `std::reduce`, `std::rotate`, or with any ranges algorithm) - Compatibility with Boost (e.g. put arrays in Boost containers, use Boost Serialization, Boost interprocess, Boost algorithms) - Compatibility with Thrust or GPUs (e.g. can the array elements be in the gpu, and use the array through its interface without segfaulting, or use thrust::device_pointer memory) - Used in production (e.g. major users or industries)
I consider that Eigen, OpenCV, Kokkos, PETSc are frameworks.
It reminds me of a comparison of Boost.Beast and a full scale framework like CppCMS. When I reviewed beast it was clear that it does not do 10% of what is expected from something to make an actually useful web application.
I am not an expert on Boost.Beast, but it looks like you are not the typical audience for some of Boost libraries. The key to me, although not exclusively, is the availability of generic components and generic programming.
While it is nice to have an abstraction - if so either keep it basic or go full way - you are stuck somewhere in between std::vector ++ and something like OpenCV.
This is a fair assessment, if you see resemblances with OpenCV is welcomed but accidental, since it is a library that I don't use. Sorry if you are disappointed this library doesn't do (directly at least) things that OpenCV does; the library definitely can do other things that OpenCV can't (and you are not interested in), and even if OpenCV can do them, it will do them with an interface that I is not within the scope of the goals of my library.
I don't fill confident adding an OpenCV column because I don't have
experience with it, but feel free to help me adding a library and answering the points of each row in the comparison table.
I suggest getting some experience with OpenCV. It is a very good library that already implements what you have (also in a different way)
I doubt OpenCV implements everything that Multi has. But even if it does, it does it with a different interface that was a priority for my design. (To be fair, Multi doesn't implement everything that OpenCV does either, for sure)
It ain't perfect by any means. But it works, well debugged, understood and is a widely available library that does the job.
I don't doubt this. I think it is good that both libraries are different, I don't need all the bells and whistles that OpenCV has and it is undeniably a heavy dependency. I don't need rendering images, loading images, or be optimized for element types that OpenCV offers.
What is not clear to me is why should I use one over some existing solution like OpenCV?
- Because sometimes your elements are not numerical types.
Yeahhh... don't buy it. Sorry :-)
yeah, but I do need a 100x100 array of std::strings. and a 20x10 array of std::tuples. I would love to know if OpenCV can store those.
- Because sometimes you want to specify your element types as template
parameters not as OpenCV encoded types.
To make your code compilation slower and more horrible? Actually OpenCV supports templated accessors.
This is the perennial discussion between header-only and pre-compiled libraries. I don't have anything to add, the idea is that you pay at compilation for what you use. Feel free to try the library in Godbolt to see the compilation times and the machine code it produces. It is a pity that OpenCV is not in Godbolt to compare both online. Having said that, I would welcome any comparison in timing and usability between the two libraries within or around the scope of what I am proposing.
- Because sometimes you want arbitrary dimensionality (e.g. 2D, 3D, 6D)
to be compile-time.
And why it isn't possible with OpenCV
I don't know, is it possible?
From the examples I see online, all OpenCV arrays regardless of dimensionality have all the same type. Maybe I am mistaken.
- Because sometimes you want to apply generic algorithms to your arrays
(STL, std::sort, std::rotate, std::ranges, boost::algorithms, serialization)
Yeah... good luck with that in numerical context. But Ok.
I use the library in numerical context, I have no problem with that: https://gitlab.com/npneq/inq https://github.com/QMCPACK/qmcpack
In OpenCV you can just get several pointers and work with them
You can not have it both ways. If you need a pointer and the sizes (and strides), Multi gives you access for that too; and then according to your definition, it would be equally powerful as OpenCV. https://gitlab.com/correaa/boost-multi#legacy-libraries-c-apis
- Because sometimes you want to implement function that are oblivious to
- Because sometimes you want to control allocations, use fancy pointers,
the actual dimension or your array (e.g. simultaneously view a 3D array of elements as a 2D array of something, for abstraction). parameterized memory strategies, polymorphic allocators,
OpenCV support custom allocations (actually something I exactly use right now to monitor memory consumption)
That is great, can you point me to an example?
From what I quickly see online, openCV gives its own allocators. I am not interested in non-standard allocators for this library.
It is not clear to me if you can use standard allocators and PMR allocator in OpenCV. Can it take allocators that do not return raw pointer types such as GPU? and if not, what if you allocate a raw pointer with cudaMalloc, how much of the library can you still use?
Once again - be careful what you wish for. Templated allocators are huge headache to work with in comparison to run-time ones (in real - non fancy template world that Boost Loves)
It is not a matter of what I wish. The library is not a framework, allocation is something that the user of the library brings in. I don't choose it for them.
- Because sometimes you want precise value semantics
I'm not sure it is a great idea for huge arrays/matrices. Virtually ALL tensor libraries have reference semantics for a very good reason.
Sometimes you need it, sometimes you don't. If you don't need, don't make copies, don't pass by value, that is the right thing to do and allowing the option it is a priority of the library. I prevent the library to make copies as much as I can, until the user really means it.
- Because sometimes you want to work with subblocks of an array in the
same way to need you work with the whole array.
And this is exactly what view is for... And this is common for any tensor library.
Great, I would feel bad if I didn't provide this fundamental feature. The next question is whether arrays and subarrays/subblocks/views can be treated in equal footing.
- Because sometimes you want to give guarantees in the presence of exceptions (the library doesn't throw exceptions, to be clear).
Ok this is one of the points I want to discuss here in terms of design, exceptions and broadcasting. Lets take an example (numpy - but same works for torch::Tensor)
a=np.ones(5,10) b=np.ones(2,5,1) c=a+b
It would perform broadcasting to Shape = (2,5,10) - automatically. How it can be done - you broadcast shapes of a to 2,5,10 using strides (0,10,1) and broadast b to same shape using strides (5,1,0) (I hope I have not mistaken in calcs) Then you can run easily on shape 2,5,10 and do all calculations, fetching etc.
Looking into your broadcasting docs - left me with the impression it does not support anything like this.
What do you mean by "anything like this"? The library doesn't make algebraic assumptions about the arrays of the elements. There is no binary addition + in the library, you can define your own if you want, using well know algorithms or specializations of them. Once you tell me what is your specification and implementation of '+' I can tell you how to use broadcast for your case. If it is a good example I could even add it to the documentation. Note that + implies an algebra, I don't consider it part of the scope of the library. Broadcast instead is part of the library because I consider it a non-algebraic but very powerful array manipulation feature. Broadcast is implemented in the library within a certain design. I think you are disapointed because the broadcast is not "automatic", which is a different issue.
I read this:
First, broadcasted arrays are infinite in the broadcasted dimension; iteration will never reach the end position, and calling `.size()` is undefined behavior.
Why? You certainly can iterate over broadcasted dimension...
Yes, you can iterate, that is fine.
multi::array
Also how do you handle incompatible broadcasting without exception - for example if b in the example above was of shape (2,5,3) and not (2,5,1)
The "raw" broadcast operation in the library is an operation that cannot fail. It is how the user of the broadcasted object uses it what can create problems down the road if he doesn't know what he/she is doing. The user may choose algorithms that throw when sizes are incompatible, or just assert or just continue. Who am I to choose for them? (I do it in a certain way in the BLAS adaptor but that is a completely different story).
Exceptions are useful. And claiming that you don't throw ones means you don't use new...
Exceptions are great, that doesn't mean that every library should be throwing exceptions. And you are completely correct: the library doesn't use `new`. (I couldn't, even if I wanted since the arrays should work in especial memory too). The user of the library provides the allocator, the library itself doesn't throw. At most, it is the allocation (or element assignment) what can throw and I don't have control over that because it is an arbitrary element type and the library will handle it well.
I highly doubt the practicality of the library in the current form. While generic multi-array can be nice to have, but actually it is stuck at being little bit more that MultiArray but too far even from basic numpy.
Thank you for your assessment. I think you don't agree with the scope of the library, which would be fine. May be you also think that the scope of the library is more than what I state, which is something I would like to clarify in some way. l don't think this is a little more than MultiArray, but it depends on the usage and how much you care about certain things. (there is a column in the table that will help you evaluate the differences). Thank you, Alfredo