Thank you Hans and Alexander for your interest and suggestions, really appreciate it! Let me try to answer questions from both of you in the same post, as I feel they touch on many common points.
Can you add generic GPU support with Boost.Compute? https://www.boost.org/doc/libs/1_75_0/libs/compute/doc/html/index.html
Reusing what we have (uBlas, Boost.Compute) might be a good idea too.
In a parallel branch of this thread Cem Bassoy also suggested uBlas, and looking closer at both libraries it does look like both uBlas and Boost.Compute are good options to use for optimized underlying compute implementation. At a first glance, Boost.Compute is slightly more appealing, because it offers raw computation tools without an extra abstraction layer of uBlas tensors. Despite the use of "tensors" in the NN interface, the operations in many NN layers are limited to element-wise operations, so it may turn out that most of the benefits of uBlas will be unused. I will need to experiment more with both of these libraries to get a better sense which one is the best fit. The preliminary idea is to split responsibilities between NN and uBlas/Boost.Compute such that NN library defines an interface and familiar abstractions in the NN domain, and uBlas/Boost.Compute are used as the core computation engine. If this idea works out as I hope it will do, we can put aside the discussion about the hardware support, because it will come with the underlying compute engine, and we can focus more on the convenience of the interface and abstractions that an NN library can provide for easier use of ML elements.
Are you building the network at compile-time or run-time? It looks from your examples like it is compile-time. I think your library should offer both. Building the network at compile-time may give some speed benefits as it can gain from compiler optimisations, but it would require re-compilation to change the network itself. Building the network at run-time means you can change the network without re-compiling. This is useful for example when you want to read the network configuration (not only its weights) at run-time from a configuration file. It is possible to offer both implementations under a unified interface, as I am doing in Boost.Histogram. Other libraries which offer this are std::span and the Eigen library.
Using compile-time models makes this focused on usage of ML instead of development and allows the optimizations from the compiler to be used which are very important for small models.
What is currently on GitHub is a compile-time model building framework. The compiler optimizations are one of the reasons. Another reason is that it allows to catch any accidental compatibility problems between layers at a very early stage. The later part turned out to be convenient right away, and caught a few arithmetic mistakes which I made while writing the sample MNIST model. The support to build the network at run-time is an interesting point as well. You are correct, it may be useful to have when network configuration evolves from one version to another. I can think of three types of changes that may happen to a network: only the layer weights change after re-training a network on a better data set; the configuration or size of inner layers changes but input and output values remain the same; and finally, the entire network changes, including hidden layers and either input or output size, or both. Between these types, a compile-time model is suitable for the first and third ones: if either input or output values change, most likely that the code around it will be recompiled anyway to adapt to the new values. So, it is the second type of change that cannot be handled by a compile-time model and needs a run-time model reconstruction. I do not have a good intuition about how frequent each type of change may be, and the answer may depend on how exactly an updated model is released and deployed. If the model upgrade is done as part of a new release, then the code is recompiled anyway, so there is no difference. And if the new model is released in a form of configuration update, then the run-time reconstruction of the network will come handy, as you correctly notice. The price for such convenience is that compile-time optimizations may be reduced, although the difference may be offset by using uBlas/Boost.Compute as the computation engine. I agree that it is probably a good idea to offer both options with the same interface, but I need to think more on how this can be actually achieved in code. I did not have a chance to look at the libraries that you mentioned, but I will definitely do so for the inspiration.
Smallish networks are certainly a niche, if you want to do anything serious you won't be able to beat TF/PyTorch in performance. So keeping this focused on small, static (aka compiletime) models with only the basic layers and maybe even with optional training (removing this avoids the auto-differentiation need) could be the way.
This is indeed the niche I am targeting. To my knowledge, large networks require not only the best performance to reduce the costs of running and training them, but they also come with the additional challenges of running the network training and predictions in a distributed way, because the model or even the input to the model may not fit into memory. These are all interesting problems to solve, but they would require solutions like Hadoop or Apache Spark, and this is a completely different topic.
However I fear this is a not fit for Boost. ML evolves so fast, adding more and more layer types etc., that I fear this library to be outdated already during review. The only chance I see if this purposely is for very basic networks, i.e. FullyConnected, Convolution, SoftMax and similar basic layers, maybe with an extension to provide a ElementWise and BinaryOp layer templated by the operator (this may be problematic for auto-differentiation though).
You are correct, ML is evolving very quickly, and new configurations and layer types are proposed very often. And if I were to choose between two approaches: try to keep the library up to date with the new research, or keep the library scoped to a subset of layers that are proven to be useful in a variety of networks, I would opt for the later. The good part about ML layers is that most of the time they can be added incrementally over time, and which layers to add can be determined from the popular demand. But I can only speak for myself, and I will trust the judgment of the Boost community members. Best regards, Sergei Marchenko.