Date: Sat, 21 Sep 2024 20:10:25 +0300 From: Artyom Beilis
I do scientific computing with large arrays, and nobody uses OpenCV. It would be fair if you had said that it is the de facto standard for image processing, which I don't do.
Ok... if you haven't used does not mean that it isn't very common computing library that many developers/designers go by default.
Maybe we don't agree what numeric means, or we just work in different environments. Anyway, I would agree that OpenCV is unquestionably a standard for image processing.
Second, Multi is not a numerical library specifically; it is about the logic and semantics of multidimensional arrays and containers, regardless of the element type. ...
See, this is exactly the problem. Why would I need something like that if I need to go to all the 3rd party libraries to actually use one efficiently?
The same reason some of use the standard library containers, or ranges, etc, even if they don't "do everything".
cv::Mat is numpy like NDArray with strides, windows, offsets (yes it supports more than two dimensions). I myself used/written several Tensor like objects and used (pytorch C++ tensor, dlprimitives and other objects). It is nothing new, by all means it is the easiest part of any numerical library.
I don't know how generically OpenCV can treat strides, windows, offsets, etc.; I trust you. Again, this boils down to compare apples with oranges, frameworks vs components. I don't have experience with OpenCV, I have seen code using with OpenCV, it seems it has a lot of features, you can load images, render windows, and do some array computations. It doesn't look like generic code to me, STL can not be used directly, for example (which is fine). I propose that, if you need to read images, and do the standard things that openCV is build for, such image processing, and probably much beyond, by all means use OpenCV. And I propose that if you need to do generic programming with standard algorithms, iterators, and some level of abstraction, use Multi, or something else. If you are not interested in the second (which is fine) then you are not the right audience for the proposed library.
2. While actual ndarry handling is nice, it is basically a tip of an iceberg. You need a decent set of high performance algo combined with
it.
I couldn't agree more. The thing that I appreciate about generic programming and programming with components is that it separates data structures and algorithms. And if one does it right, the goal is no loss of efficiency. Of course this requires a lot of work, specially for library developers.
There is a clear separation of concerns here. The Multi library deals with the data structure, it uses when it needs
The Multi library doesn't provide algorithms, it uses the algorithms
to fulfill its semantics, the best *existing* algorithms it can in awareness of the datastructure. that are provided to it via different mechanisms.
But if you don't provide algorithms, maybe I'd better take a library/framework that does.
Exactly, that is the appeal of frameworks. Frameworks are great if they do all that you need and no less. The moment you need to do something that the framework has not contemplated you are totally on your own, without much help. There are plenty of numpy-like arrays around. Usually they are called
tensors...
I agree, and almost all of them are frameworks. I don't think anything works with existing STL algorithms, iterators. If you are not interested in this, this is not for you.
I agree, promising all linear algebra is infinite work, like reimplenting MATLAB or Mathematica, but BLAS has a finite number of functions. The philosophy of the BLAS-adaptor in particular (again, an optional component) is to interface to what BLAS offers and not more. It is more of a model how to interface a legacy library using the features of Multi.
But that is exactly what make opencv useful and multi-array like a fasade with emptiness behind.
This is like saying all the STL containers are just "emptiness behind". In some sense it is true; you need algorithms, that exists separately. I understand your feelings. The emptiness that you feel, the "rest" is the algorithms and adaptors that exists separately from the data structure. There are excellent libraries for that, starting from STL.
I did breef look into implementation and it seems to be lacking of
vectorization (use of intrinsics/vector operators) or have I missed?
You missed that this is a generic, not specifically numerical, library.
But, you making numpy-like library...
You keep bringing up numpy. Some users of the library see the analogy with numpy because of the easy of use, and I appreciate that but I don't mention numerics or numpy in the documentation, except in one or two places with very clear context. If, for you, numpy implies numerics, then you are using the wrong analogy and point of comparison.
otherwise you wouldn't be interfacing cblas.
Would you feel better about it if I remove the BLAS adaptor? This is a completely optional component. This is to make the library more immediately useful without converting it into a framework. The interface with BLAS is very strictly to be have to use BLAS through the Multi facilities in a functional (immutable) context that is friendly, for example, to STL algorithms that take lambdas. See, if you have been talking about multi-array as advanced
std::vector of generic objects... Ok - but you don't
"Advanced std::vector of generic objects" is a fair scope for the library, thank you. I would say it is *very* advanced version of std::vector, but this is just an opinion I have. I don't understand exactly what is your complain, what do you mean with "but you don't". This is what I say in the introduction: "This library offers array containers and subarrays in arbitrary dimensions with well-behaved value semantics, featuring logical access recursively across dimensions and to elements through indices and iterators." I don't know where you read numerics in there. you direct it to the numeric computations.
I don't direct it to the numerical computation; I clearly say in the introduction that this is not a numerical library. I use build-in types (preferably ints) in the examples for conciseness. I just say that if you want to use it in numerical context, it still should be ok for that. The reason I don't and I won't explicitly rule out numeric applications in the documentation is because in my opinion there are no obstacles to put to use the library in numerical context. The library is in fact used in numerical context by other projects.
For example, then dealing with GPU or OpenMP arrays, the library uses
The other thing to take into account is that vectorization/parallelization is still provided by the external algorithms the library uses internally. thrust algorithms if they are available, which are parallel.
Just for the record there are two levels of parallelization on CPU level: 1 thread based parallelization, 2nd SIMD level parallelization like SSE/AVX2/Neon - where you load vectors of 16, 32 bytes of data and process them together in a single instruction. These can increase the performance significantly.
Yes, I gave the example about thread paralellization, through execution policies, because it is cleaner. SIMD parallelization is more difficult to apply in general because it depends on data to be continuous in at least one dimension, which is only realized by very specific layouts (this is the reason BLAS matrices always have at least one stride = 1). This is not the general case, when you manipulate arrays dynamically. But I agree it still can be applied with some effort. I would do everything I can to offer the possibility of doing SIMD through the library in combination with external algorithms and lift any obstacles to it, if there are any. Having said that, it is not a big concern for the library implementation because 1) it is not numeric specifically, 2) semantic operations in the library, such as assignment or construction do not involve computation. However, if you have a clear idea how to enable SIMD for a particular problem you are free to do it, once again, through specialized algorithms.
OpenCV isn't 2d only. It support n-D tensors with views.
Ok, I stand corrected then. @Artyom, Can you help me understand how it compare in these aspects that are important for my library? I bet OpenCV will do very well in the comparison. It is a simple checkbox list. - external Deps ? - Arbritary number of dims (e.g. 11 dimensions) - Non-owning view of data (e.g. manipulate view memory provided by others) - Compile-time dim size - Array values (owning data) (e.g. can I put arrays inside a std::list?) - Value semantic (Regular) (can I assign, swap, with expected Stepanov regularity results) - Move semantics (e.g. will this copy data arr2 = std::move(arr1) ?) - const-propagation semantics (e.g. Mat const arr; ... is arr really read-only) - Element initialization (e.g. can the arrays be initialized at declaration e.g. Mat const arr = {1.0, 2.0, ...}) - References w/no-rebinding (e.g. can I name a subblock of an array, e.g. `auto sub = subblock of arr`? does it have reference semantics (no copy)?) - Element access (e.g. how easy is to access a speicif elements, e.g. in 4 dimensions `arr( 3, 4, 5, 6)`) - Partial element access, (e.g. take n-th column or n-th row of a 2D array) - Subarray views (e.g. generate a "view" 2D subblock of a 2D array) - Subarray with lower dim (e.g. generate a "view" nD subblock of a mD array, where n < m). - Subarray w/well def layout (e.g. access the layout of a subblock, if sunblocks can be referred to?) - Recursive subarray (e.g. can sunblocks "views" of subblocks "view" be temporaries) - Custom Alloctors (e.g. thrust::device_allocator, boost::interprocess::allocator) - PMR Alloctors (e.g. use std::pmr::monotonic_memory_resource) - Fancy pointers / references (e.g. use memory not represented by raw pointers, e.g. thrust::device_pointer, boost::interprocess::offset_ptr) - Stride-based Layout (e.g. supports strides layout, element, and can gives this information to low level libraries) - Fortran-ordering (e.g. left-index is the fast index in memory) - Zig-zag / Tiles / Hilbert ordering / (e.g. fancy layouts beyond strides) - Arbitrary layout (e.g. can data be laid out arbitrarily in memory in a user-defined way, not strides, not zig-zag) - Flattening of elements (e.g. any facilities to look at the elements in a flatted way beyond simply giving a pointer to the first element, which will not work for subblocks) - Iterators (e.g. have the array, in any useful sense, .begin and .end?) - Multidimensional iterators (cursors) (e.g. auto h = arr_subarray.home(); h gives access to elements but is light as a pointer) - STL algorithms or Ranges (e.g. would it work with `std::sort`, `std::copy`, `std::reduce`, `std::rotate`, or with any ranges algorithm) - Compatibility with Boost (e.g. put arrays in Boost containers, use Boost Serialization, Boost interprocess, Boost algorithms) - Compatibility with Thrust or GPUs (e.g. can the array elements be in the gpu, and use the array through its interface without segfaulting, or use thrust::device_pointer memory) - Used in production (e.g. major users or industries)
I consider that Eigen, OpenCV, Kokkos, PETSc are frameworks.
It reminds me of a comparison of Boost.Beast and a full scale framework like CppCMS. When I reviewed beast it was clear that it does not do 10% of what is expected from something to make an actually useful web application.
I am not an expert on Boost.Beast, but it looks like you are not the typical audience for some of Boost libraries. The key to me, although not exclusively, is the availability of generic components and generic programming.
While it is nice to have an abstraction - if so either keep it basic or go full way - you are stuck somewhere in between std::vector ++ and something like OpenCV.
This is a fair assessment, if you see resemblances with OpenCV is welcomed but accidental, since it is a library that I don't use. Sorry if you are disappointed this library doesn't do (directly at least) things that OpenCV does; the library definitely can do other things that OpenCV can't (and you are not interested in), and even if OpenCV can do them, it will do them with an interface that I is not within the scope of the goals of my library.
I don't fill confident adding an OpenCV column because I don't have
experience with it, but feel free to help me adding a library and answering the points of each row in the comparison table.
I suggest getting some experience with OpenCV. It is a very good library that already implements what you have (also in a different way)
I doubt OpenCV implements everything that Multi has. But even if it does, it does it with a different interface that was a priority for my design. (To be fair, Multi doesn't implement everything that OpenCV does either, for sure)
It ain't perfect by any means. But it works, well debugged, understood and is a widely available library that does the job.
I don't doubt this. I think it is good that both libraries are different, I don't need all the bells and whistles that OpenCV has and it is undeniably a heavy dependency. I don't need rendering images, loading images, or be optimized for element types that OpenCV offers.
What is not clear to me is why should I use one over some existing solution like OpenCV?
- Because sometimes your elements are not numerical types.
Yeahhh... don't buy it. Sorry :-)
yeah, but I do need a 100x100 array of std::strings. and a 20x10 array of std::tuples. I would love to know if OpenCV can store those.
- Because sometimes you want to specify your element types as template
parameters not as OpenCV encoded types.
To make your code compilation slower and more horrible? Actually OpenCV supports templated accessors.
This is the perennial discussion between header-only and pre-compiled libraries. I don't have anything to add, the idea is that you pay at compilation for what you use. Feel free to try the library in Godbolt to see the compilation times and the machine code it produces. It is a pity that OpenCV is not in Godbolt to compare both online. Having said that, I would welcome any comparison in timing and usability between the two libraries within or around the scope of what I am proposing.
- Because sometimes you want arbitrary dimensionality (e.g. 2D, 3D, 6D)
to be compile-time.
And why it isn't possible with OpenCV
I don't know, is it possible?
From the examples I see online, all OpenCV arrays regardless of dimensionality have all the same type. Maybe I am mistaken.
- Because sometimes you want to apply generic algorithms to your arrays
(STL, std::sort, std::rotate, std::ranges, boost::algorithms, serialization)
Yeah... good luck with that in numerical context. But Ok.
I use the library in numerical context, I have no problem with that: https://gitlab.com/npneq/inq https://github.com/QMCPACK/qmcpack
In OpenCV you can just get several pointers and work with them
You can not have it both ways. If you need a pointer and the sizes (and strides), Multi gives you access for that too; and then according to your definition, it would be equally powerful as OpenCV. https://gitlab.com/correaa/boost-multi#legacy-libraries-c-apis
- Because sometimes you want to implement function that are oblivious to
- Because sometimes you want to control allocations, use fancy pointers,
the actual dimension or your array (e.g. simultaneously view a 3D array of elements as a 2D array of something, for abstraction). parameterized memory strategies, polymorphic allocators,
OpenCV support custom allocations (actually something I exactly use right now to monitor memory consumption)
That is great, can you point me to an example?
From what I quickly see online, openCV gives its own allocators. I am not interested in non-standard allocators for this library.
It is not clear to me if you can use standard allocators and PMR allocator in OpenCV. Can it take allocators that do not return raw pointer types such as GPU? and if not, what if you allocate a raw pointer with cudaMalloc, how much of the library can you still use?
Once again - be careful what you wish for. Templated allocators are huge headache to work with in comparison to run-time ones (in real - non fancy template world that Boost Loves)
It is not a matter of what I wish. The library is not a framework, allocation is something that the user of the library brings in. I don't choose it for them.
- Because sometimes you want precise value semantics
I'm not sure it is a great idea for huge arrays/matrices. Virtually ALL tensor libraries have reference semantics for a very good reason.
Sometimes you need it, sometimes you don't. If you don't need, don't make copies, don't pass by value, that is the right thing to do and allowing the option it is a priority of the library. I prevent the library to make copies as much as I can, until the user really means it.
- Because sometimes you want to work with subblocks of an array in the
same way to need you work with the whole array.
And this is exactly what view is for... And this is common for any tensor library.
Great, I would feel bad if I didn't provide this fundamental feature. The next question is whether arrays and subarrays/subblocks/views can be treated in equal footing.
- Because sometimes you want to give guarantees in the presence of exceptions (the library doesn't throw exceptions, to be clear).
Ok this is one of the points I want to discuss here in terms of design, exceptions and broadcasting. Lets take an example (numpy - but same works for torch::Tensor)
a=np.ones(5,10) b=np.ones(2,5,1) c=a+b
It would perform broadcasting to Shape = (2,5,10) - automatically. How it can be done - you broadcast shapes of a to 2,5,10 using strides (0,10,1) and broadast b to same shape using strides (5,1,0) (I hope I have not mistaken in calcs) Then you can run easily on shape 2,5,10 and do all calculations, fetching etc.
Looking into your broadcasting docs - left me with the impression it does not support anything like this.
What do you mean by "anything like this"? The library doesn't make algebraic assumptions about the arrays of the elements. There is no binary addition + in the library, you can define your own if you want, using well know algorithms or specializations of them. Once you tell me what is your specification and implementation of '+' I can tell you how to use broadcast for your case. If it is a good example I could even add it to the documentation. Note that + implies an algebra, I don't consider it part of the scope of the library. Broadcast instead is part of the library because I consider it a non-algebraic but very powerful array manipulation feature. Broadcast is implemented in the library within a certain design. I think you are disapointed because the broadcast is not "automatic", which is a different issue.
I read this:
First, broadcasted arrays are infinite in the broadcasted dimension; iteration will never reach the end position, and calling `.size()` is undefined behavior.
Why? You certainly can iterate over broadcasted dimension...
Yes, you can iterate, that is fine.
multi::array
Also how do you handle incompatible broadcasting without exception - for example if b in the example above was of shape (2,5,3) and not (2,5,1)
The "raw" broadcast operation in the library is an operation that cannot fail. It is how the user of the broadcasted object uses it what can create problems down the road if he doesn't know what he/she is doing. The user may choose algorithms that throw when sizes are incompatible, or just assert or just continue. Who am I to choose for them? (I do it in a certain way in the BLAS adaptor but that is a completely different story).
Exceptions are useful. And claiming that you don't throw ones means you don't use new...
Exceptions are great, that doesn't mean that every library should be throwing exceptions. And you are completely correct: the library doesn't use `new`. (I couldn't, even if I wanted since the arrays should work in especial memory too). The user of the library provides the allocator, the library itself doesn't throw. At most, it is the allocation (or element assignment) what can throw and I don't have control over that because it is an arbitrary element type and the library will handle it well.
I highly doubt the practicality of the library in the current form. While generic multi-array can be nice to have, but actually it is stuck at being little bit more that MultiArray but too far even from basic numpy.
Thank you for your assessment. I think you don't agree with the scope of the library, which would be fine. May be you also think that the scope of the library is more than what I state, which is something I would like to clarify in some way. l don't think this is a little more than MultiArray, but it depends on the usage and how much you care about certain things. (there is a column in the table that will help you evaluate the differences). Thank you, Alfredo
śr., 25 wrz 2024 o 04:27 Alfredo Correa via Boost
Second, Multi is not a numerical library specifically; it is about the logic and semantics of multidimensional arrays and containers, regardless of the element type. ...
See, this is exactly the problem. Why would I need something like that if I need to go to all the 3rd party libraries to actually use one efficiently?
The same reason some of use the standard library containers, or ranges, etc, even if they don't "do everything".
So, Artyom says that storage and element access alone is insufficient to warrant the existence of a library. Alfredo says something opposite. Alfredo, what would help here is if you demonstrated that your library has users, and have the users say why they chose it, given that they have to get the algorithms from elsewhere. Maybe you are such a user? The rest of my post is "academic", as I do not have experience in the field. Having only the storage and access abstraction would be preferred over a framework, if for your particular use case you have to employ two domains, like image-processing and generic AI/ML, and you want two sets of algorithms applied to the very same data. Then there may be no single framework that satisfies your need, and you may need to make two frameworks interoperate. Next, the analogy to STL alone is not good enough, I think. It is on you to demonstrate that the idea of generic programming also applies to *real life* usages of big multidimensional arrays. STL itself has been criticised that because it is generic, it cannot be optimized for particular types. ( https://www.youtube.com/watch?v=FJJTYQYB1JQ&ab_channel=CppCon) In the context of big multi-dimensional arrays, we are talking about heavy computations. And maybe the data structures not optimized for specific use cases are simply disqualified from the outset. Please, treat it as a hint on how to communicate your ideas to people in this forum, in order to convince them. Regards, &rzej;
Hi Andrzej,
On Tue, Sep 24, 2024 at 11:32 PM Andrzej Krzemienski
śr., 25 wrz 2024 o 04:27 Alfredo Correa via Boost
napisał(a): Second, Multi is not a numerical library specifically; it is about the logic and semantics of multidimensional arrays and containers, regardless of the element type. ...
See, this is exactly the problem. Why would I need something like that if I need to go to all the 3rd party libraries to actually use one efficiently?
The same reason some of use the standard library containers, or ranges, etc, even if they don't "do everything".
So, Artyom says that storage and element access alone is insufficient to warrant the existence of a library. Alfredo says something opposite.
If that is the point of the discussion and I didn’t realize it, that would be a very fair point from Artyom. Alfredo, what would help here is if you demonstrated that your library has
users, and have the users say why they chose it, given that they have to get the algorithms from elsewhere. Maybe you are such a user?
I don’t think it would help, but I want to be transparent and it is a good timing to do an assessment: Project users (list in the docs) https://github.com/QMCPACK/qmcpack (294 stars, 137 forks, 36 watch) https://github.com/llnl/inq (23 stars, 4 forks, 10 watch; 22 stars, 16 forks in gitlab) (disclaimer I am or I was involved in these projects above) another AFQMC (auxiliary field quantum monte carlo simulation code) at the Flatiron institute that is still a private repository, soon to be open source according to the authors. (not involved). other users: 6 stars, 2 watching in github repo 13 stars, 5 forks in gitlab repo about 100 issues by associated developers and I tracked about 2 or 3 issues openers that work at independent groups (quite advanced users if I have to tell). cpplang Slack #boost-multi channel: 18 members. (sorry if there is double counting, this is the best I can do) Most of the praise, from colleagues at least, is about the flexibility with allocations. In particular to separate allocations from array lifetime (i.e. memory pools) which solves 50% of the problems of value semantics with big arrays, especially in the GPU. Another was about how straightforward was to incrementally rewrite old code.
The rest of my post is "academic", as I do not have experience in the field.
Having only the storage and access abstraction would be preferred over a framework, if for your particular use case you have to employ two domains, like image-processing and generic AI/ML, and you want two sets of algorithms applied to the very same data. Then there may be no single framework that satisfies your need, and you may need to make two frameworks interoperate.
Yep, that is another problematic aspect of frameworks. In general the combinatorial explosion of tools necessary to make
Next, the analogy to STL alone is not good enough, I think. It is on you to demonstrate that the idea of generic programming also applies to *real life* usages of big multidimensional arrays. STL itself has been criticised that because it is generic, it cannot be optimized for particular types. ( https://www.youtube.com/watch?v=FJJTYQYB1JQ&ab_channel=CppCon)
Fair enough, the STL is opt-in, internal algorithms may be forwarded to STL, so far I didn’t have the need to use other libraries, except for GPUs. But I am open to discussing this more and improve if necessary. STL is not perfect. in fact I found a couple of obvious problem with how traits are used in STL and that maybe the library doesn’t *optimally* fit with some concepts in the STL regarding iterator categories). The main pain points, if you are curious, is that std::copy_n is not usable with broadcasted arrays (I opened a GCC lib bug with A. O’Dwyer regarding this) and that std::sort makes some overgeneralized assumptions about a trade off between copy (which allocates and can throw) plus n moves versus n swaps. Incidentally, this is related to the video link you sent, this is exactly the point I found, I believe that 1) multi iterators are iterators that can fall into a new category of iterators, for which I don't have a name yet, perhaps between random_iterator and bidirectional to handle broadcasted arrays. 2) std::sort (and other STL algorithms) is not customized enough, at least in the place I found where a "rotation-by-1" is implemented as a copy to a temporary and n-copies instead of n-swaps, even if n-swaps would be better for some elements types and "row" sizes (and be noexcept) (this is the infinite customization that Andrei talks about at https://youtu.be/FJJTYQYB1JQ?t=4475, thank you for reminding me of this talk). It would be fun, if not academic, exercise to see if I can apply Andrei's super-duper sort to array rows. I might discover something new as well. Notice that even though Andrei criticizes STL, he at least still uses iterators, which is enough for me. I would welcome any introspection mechanism that he would need for this. (There are other aspects that make the library useful, such as the flattening of arbritrary subarrays which effectively does a kind of loop-fusion and makes the use of STL even more attractive.)
In the context of big multi-dimensional arrays, we are talking about heavy computations. And maybe the data structures not optimized for specific use cases are simply disqualified from the outset.
yes
Please, treat it as a hint on how to communicate your ideas to people in this forum, in order to convince them.
Absolutely, I appreciate it.
Regards, &rzej;
Again, this boils down to compare apples with oranges, frameworks vs components.
I don't have experience with OpenCV, I have seen code using with OpenCV, it seems it has a lot of features, you can load images, render windows, and do some array computations.
OpenCV itself is divided in many components, core, image io, processing, and many more. Even header only parts suitable for pure SIMD handling
But if you don't provide algorithms, maybe I'd better take a library/framework that does.
Exactly, that is the appeal of frameworks. Frameworks are great if they do all that you need and no less.
The moment you need to do something that the framework has not contemplated you are totally on your own, without much help.
There is nothing wrong with having a good component. That can interface with existing systems like cblas, opencv Mat (like np.ndarray cooperate with cv2) and so on. Especially if you can provide "better" and more suitabe interface to a somewhat aged cv2. For example Boost.Locale wraps horrible ICU API. BUT - it looks it neither creates a useful framework or useful core library _because_ it does not provide basic functionality expected for a library with numeric processing use in mind.
There are plenty of numpy-like arrays around. Usually they are called tensors...
I agree, and almost all of them are frameworks.
I don't think anything works with existing STL algorithms, iterators. If you are not interested in this, this is not for you.
Ok, very good point. Many/most stl algorithms aren't that suitable for numeric computations. You wouldn't store vectors in std::list and run addition on them (even if you can) because it would perform horribly More about it below.
I did breef look into implementation and it seems to be lacking of vectorization (use of intrinsics/vector operators) or have I missed?
You missed that this is a generic, not specifically numerical, library.
But, you making numpy-like library...
You keep bringing up numpy. Some users of the library see the analogy with numpy because of the easy of use, and I appreciate that but I don't mention numerics or numpy in the documentation, except in one or two places with very clear context.
If, for you, numpy implies numerics, then you are using the wrong analogy and point of comparison.
90% of your cases are related to numeric computations - and for a VERY good reason! So while it can be nice to have std::string Tensor (also not sure what for) I don't see actual need for this outside numeric computations. (But I may be wrong)
otherwise you wouldn't be interfacing cblas.
Would you feel better about it if I remove the BLAS adaptor? This is a completely optional component.
This is to make the library more immediately useful without converting it into a framework. The interface with BLAS is very strictly to be have to use BLAS through the Multi facilities in a functional (immutable) context that is friendly, for example, to STL algorithms that take lambdas.
1st you can use lambdas in tensor context - it is actually done quite extensively in libraries like pytorch - but it is done a little bit differently. You run something like that (pseudo code) run_algo_parallen(a,b,[](a_section, b_section, range) { for(i : range) { b_section[i] += a_section[i] *2 } } It allows both run in range, use simd if possible and so on, even parallelize It is quite different approach for use in numerical field. I myself need to run lots of generic algorithms on MD tensors with genric range in dlprimitives and pytorch opencl backend I work on. So something like broadcasting for CPU with dynamic number of ranges so I can run a generic lambda would be HIGHLY useful - but you need an interface that supports it.
you direct it to the numeric computations.
I don't direct it to the numerical computation; I clearly say in the introduction that this is not a numerical library.
Once again the main use case is numeric, vast majority of samples are numeric integrations are numeric. And one example with strngs. If it walks like a duck, quacks like a duck... It is my observation and I think it is quite reasonable one especially since it is main use case of strided/md arrays around.
SIMD parallelization is more difficult to apply in general because it depends on data to be continuous in at least one dimension, which is only realized by very specific layouts (this is the reason BLAS matrices always have at least one stride = 1). This is not the general case, when you manipulate arrays dynamically. But I agree it still can be applied with some effort.
There is a VERY good reason that blas and actualy virtually any library is optimized for at least one stride=1... Otherwise you'll destroy your cache. There is a reason why matrices and even sparse matrices implemented as contiguous arrays in memory
- external Deps ?
OpenCV has wide range of libraries - some come with no deps some with some. I use OpenCV compiled on Android - core, imgproc and imgio with very basic dependencies - of course for imageio I do need libpng/libjpeg - but this is an optional component
- Arbritary number of dims (e.g. 11 dimensions) Yes - Non-owning view of data (e.g. manipulate view memory provided by others) Yes - Compile-time dim size
Don't think so
- Array values (owning data) (e.g. can I put arrays inside a std::list?)
Not clear what do you mean? If you can put cv::Mat to std::list? Yes, but generally cv::Mat is reference counted.
- Value semantic (Regular) (can I assign, swap, with expected Stepanov regularity results)
cv::Mat is reference counted for
- Move semantics (e.g. will this copy data arr2 = std::move(arr1) ?)
Don't remember
- const-propagation semantics (e.g. Mat const arr; ... is arr really read-only)
Don't think so
- Element initialization (e.g. can the arrays be initialized at declaration e.g. Mat const arr = {1.0, 2.0, ...})
Depends: https://stackoverflow.com/questions/44965940/cvmat-initialization-using-arra...
- References w/no-rebinding (e.g. can I name a subblock of an array, e.g. `auto sub = subblock of arr`? does it have reference semantics (no copy)?)
Yes
- Element access (e.g. how easy is to access a speicif elements, e.g. in 4 dimensions `arr( 3, 4, 5, 6)`) - Partial element access, (e.g. take n-th column or n-th row of a 2D array)
If I understand your correctly yes to both.
- Subarray views (e.g. generate a "view" 2D subblock of a 2D array) yes
- Subarray with lower dim (e.g. generate a "view" nD subblock of a mD array, where n < m).
yes
- Subarray w/well def layout (e.g. access the layout of a subblock, if sunblocks can be referred to?)
Not sure I understand what do you mean.
- Recursive subarray (e.g. can sunblocks "views" of subblocks "view" be temporaries) If I understand you correctly - yes -
- Custom Alloctors (e.g. thrust::device_allocator, boost::interprocess::allocator)
cv::Mat has custom allocators support -
- PMR Alloctors (e.g. use std::pmr::monotonic_memory_resource)
Not sure what do you mean.
- Fancy pointers / references (e.g. use memory not represented by raw pointers, e.g. thrust::device_pointer, boost::interprocess::offset_ptr)
Don't think so. But finally you have a pointer to a specific memory even with boost::iterprocess - also pointer may differ between processes. Not familiar with thrust.
- Stride-based Layout (e.g. supports strides layout, element, and can gives this information to low level libraries)
cv::Mat has strided layout
- Fortran-ordering (e.g. left-index is the fast index in memory)
= strided layout (i.e. tanspose)
- Zig-zag / Tiles / Hilbert ordering / (e.g. fancy layouts beyond strides)
Not familiar with that so can't say.
- Arbitrary layout (e.g. can data be laid out arbitrarily in memory in a user-defined way, not strides, not zig-zag)
Not sure it is even relevant for numerical processing... but don't think so.
- Flattening of elements (e.g. any facilities to look at the elements in a flatted way beyond simply giving a pointer to the first element, which will not work for subblocks)
Not sure what do you mean.
- Iterators (e.g. have the array, in any useful sense, .begin and .end?)
AFAIR Yes - also myself I rarely use them.
- Multidimensional iterators (cursors) (e.g. auto h = arr_subarray.home(); h gives access to elements but is light as a pointer)
Not sure what do you mean.
- STL algorithms or Ranges (e.g. would it work with `std::sort`, `std::copy`, `std::reduce`, `std::rotate`, or with any ranges algorithm)
Not sure I hadn't worked with it because for numerical data you usually use specific approaches that would be more efficient than generic algo. So there are many standard algorithms there that are numerically aware and more suitable
- Compatibility with Boost (e.g. put arrays in Boost containers, use Boost Serialization, Boost interprocess, Boost algorithms) - Compatibility with Thrust or GPUs (e.g. can the array elements be in the gpu, and use the array through its interface without segfaulting, or use thrust::device_pointer memory)
Not clear what do you mean. just for the record there is cv::cuda and cv opencl etc.
- Used in production (e.g. major users or industries)
OpenCV is used almost everywhere in the industry
While it is nice to have an abstraction - if so either keep it basic or go full way - you are stuck somewhere in between std::vector ++ and something like OpenCV.
This is a fair assessment, if you see resemblances with OpenCV is welcomed but accidental, since it is a library that I don't use. Sorry if you are disappointed this library doesn't do (directly at least) things that OpenCV does; the library definitely can do other things that OpenCV can't (and you are not interested in), and even if OpenCV can do them, it will do them with an interface that I is not within the scope of the goals of my library.
I don't expect it to do everything OpenCV does - since for things that OpenCV does we have OpenCV. I don't see a reason to reinvent the wheel, especially that it works very-very well. However if you try to create a good muli-array interface make sure it improves interfaces over existing libraries like OpenCV for example and has interoperability and bring real value especially that main use case is numerical computations. I would like to see an interface that is useful in numerical context to run generic computations. Here is an example of a real use case I would love to see. I have several dynamic arrays and I want to run an pointwise or reduction operation over them in an efficient way automatically doing broadcasting and or reduction... Here an example I use in my project for GPU case: https://github.com/artyom-beilis/pytorch_dlprim/blob/master/src/pointwise_op... dlprim::core::pointwise_operation({x,buf,dy},{dx},{}, R"xxx( int is_negative = x0 < 0; dtype maxd = is_negative ? 1.0f: 0.0f; dtype s = is_negative ? 1.0f: -1.0f; y0 = (maxd - s * (x1 / ((dtype)(1) + x1))) * x2; )xxx", getExecutionContext(self)); This is dynamically generated GPU code. But lets extend something like that to following cases: - input arrays should broadcast and have dynamic number of dimensions - consider that maybe input values may be SIMD enabled when possible. Consider improving this: https://answers.opencv.org/question/22115/best-way-to-apply-a-function-to-ea... Especially in case you don't know number of dimensions in advance. Artyom
I don't doubt this.
I think it is good that both libraries are different, I don't need all the bells and whistles that OpenCV has and it is undeniably a heavy dependency.
It actually isn't. Depends on components you select. To be fair it is way smaller and easier to build and stable in comparison to Boost :-)
What is not clear to me is why should I use one over some existing solution like OpenCV?
- Because sometimes your elements are not numerical types.
Yeahhh... don't buy it. Sorry :-)
yeah, but I do need a 100x100 array of std::strings. and a 20x10 array of std::tuples. I would love to know if OpenCV can store those.
And give me a real world use case for using strdided storage for std::strings...
- Because sometimes you want to specify your element types as template parameters not as OpenCV encoded types.
Fair. I would like to see something like float16 and bfloat16 in opencv. Or another float format you select for example.
To make your code compilation slower and more horrible? Actually OpenCV supports templated accessors.
This is the perennial discussion between header-only and pre-compiled libraries.
Having said all that, Boost took way too "header only path". I remember like 10 years ago when I used Boost.Asio in industrial project I need to put all network objects to separate classes hidden behind pimpl to get reasonable compilation times and still the networking part that was like less that 1% of the code took 50% of compilation time. Actually nowadys Asio provides compilation as a separate library (also havn't used Asio for a long time) While header only template classes are required - algos should be in cpp unless they are truly generic. As a rule of thumb - if you can put it in cpp - put it there.
OpenCV support custom allocations (actually something I exactly use right now to monitor memory consumption)
That is great, can you point me to an example? From what I quickly see online, openCV gives its own allocators. I am not interested in non-standard allocators for this library.
Here my latest example I use to monitor memory use - it just provided pass though to standard allocation but counts memory used: https://github.com/artyom-beilis/OpenLiveStacker/blob/main/src/allocator.cpp So I can discard incoming data if I overload.
Can it take allocators that do not return raw pointer types such as GPU?
See, GPU is a very different beast... For example you don't have pointer arithmetics with OpenCL at host and you can create sub-buffers. In OpenCL you don't really have pointers - you have memory regions you don't even access from the host. Cuda Allows pointer arithmetics. In some GPUs like Intel, AMD APU or Arm/Mali you can share memory between CPU and GPU (because it is shared) but it works on memory regions not specific pointers - so it is a different beast. No pointer arithmetics and access requires mapping/unmapping.
and if not, what if you allocate a raw pointer with cudaMalloc, how much of the library can you still use?
You can't use CUDA points at host only pass them
I'm not sure it is a great idea for huge arrays/matrices. Virtually ALL tensor libraries have reference semantics for a very good reason.
Sometimes you need it, sometimes you don't. If you don't need, don't make copies, don't pass by value, that is the right thing to do and allowing the option it is a priority of the library.
Ok... lets say it this way - in tensors, mat and other libraries - default is reference and copy is explicit - you can copy. If you may it other way around like std::vector - you'll make the library virtually useless or at least highly inconvenient for main use case (numeric computations)
The "raw" broadcast operation in the library is an operation that cannot fail. It is how the user of the broadcasted object uses it what can create problems down the road if he doesn't know what he/she is doing. The user may choose algorithms that throw when sizes are incompatible, or just assert or just continue. Who am I to choose for them?
Ok... that just makes it less useful - because matrix/tensor/ndarray dimension is something dynamic and not known in the run time.
I think you don't agree with the scope of the library, which would be fine.
I think that in its current scope the library isn't that useful for primary use case - that is my point. Best Regards, Artyom
participants (3)
-
Alfredo Correa
-
Andrzej Krzemienski
-
Artyom Beilis