On Sat, May 17, 2014 at 10:17:25AM +0200, Bjorn Reese wrote:
I have a couple of major concerns with the current submission, and I am going to suggest some substantial changes. I hope that it does not discourage you too much.
I am going to suggest that:
1. The library is broken into more fundamental building-blocks (which is what both Boost and the C++ standard is all about.) 2. A more flexible data flow architecture is adopted. 3. More use of existing Boost libraries.
I recognize three more fundamental building-blocks in the current submission: active objects, messaging middleware, and data flow. I am not against a higher level actor API, but the fundamentals need to be in place first.
Thank you for taking your time for this thorough comment. However, I have to say I disagree on many levels. First of all: C++ is not about having a low level of abstraction. C++ is about having the highest level of abstraction possible without sacrificing performance. What you are suggesting is to not having an actor library in Boost. You want to have a low-level active object library with low-level networking primitives.
Building-blocks ---------------
Boost.Actor implements a distributed mailbox-based actor model. While this is a building-block to some users, it is not fundamental. It conflates the actor with distribution.
An actor *is* the fundamental primitive! We are talking about an implementation of the actor model. The whole point of the actor model is to have software entities that abstract over physical deployment. Actors are *not* the same thing as active objects. There is obviously overlap of these two models, but the actor modle is more general, i.e., describes a higher level of abstraction.
I suggest that you start with a non-distributed actor model. This is simply an active object with an incoming message queue. This can be used its own right without distribution and mailboxes. Many applications have classes with a working thread inside them, and active objects should strive to replace these classes [1].
A non-distributed actor model? I think what you really want to say here is: you want users to be able to use the lightweight actor implementation and the work-stealing scheduler without having link-dependencies to networking infrastructure they don't use. Can you agree on that? That's a fair point and useful indeed. Well, here's the thing: there is nothing baked into the actor primitive that would require such a dependency. The software design is fully modular. Separate the middleman, move publish/remote_actor to a different namespace, ship it separately, done. All you have to do to extend boost.actor is to provide an implementation for actor_proxy. Behind the scenes, you can do all kinds of stuff: networking, OpenCL-binding (that's how it's done in licppa), you name it.
Active objects have two important variation points: scheduling and the queue type. Regarding scheduling there is a C++ standards proposal that should be considered [2]. There is a GSoC project about this [3]. For the distributed case, boost::asio::io_service also has to be considered.
I know the Executor proposal and I think it's too basic to be useful for anything other than implementing std::async. ThreadPool implementations in general do work-sharing, whereas boost.actor implements work-stealing. The latter yields superior performance in almost all use cases. I wouldn't mind moving the scheduler to it's own namespace to allow other projects to build on top of that. The scheduler uses the interfaces resumable and execution_unit, so there aren't any actor-specific types.
There is also work done on message queues. We already have some in Boost.Lockfree, or the wait-free multi-producer queue in the Boost.Atomic examples, as well as sync_queue in Boost.Thread.
Show me a queue that outperforms single_reader_queue and I'll take it. Keep in mind though, that the enqueue operation uses only *a single* compare-and-swap operation [1]. I don't see how you can outperform that. Did you have a look at the performance evaluation? In particular the N:1 communication? This queue scales up to 63 concurrent writers without a measurable perforance hit. I'm not passionate about implementations details, though. Show me a queue that performs even better in boost.actor and I'll take it.
Architecture ------------
Once we have got active objects, the question is how do we connect them? The variation points here are routing and transmission.
Again, this library is not about active objects. It's about actors.
The mailbox approach is too simple for many applications. Partly because it is too limited in some regards (e.g. push-only) and too flexible in other regards (e.g. you cannot have fine-grained access control or restricted visibility.)
There are several flow-based approaches that should be considered: Boost.Iostreams has all the required concepts in place. There was a Boost.Dataflow GSoC project [4] some years ago. There is a C++ standards proposal [5] about C++ pipelines. See also the ZeroMQ guide [6] for various examples.
All of that is true and it's damn good that libcppa/boost.actor it is the way it is. I'm sorry, but again: this is an actor library. If you want to fiddle with low-level networking, this is not the library you are looking for. There was a talk at this year's C++Now about libcppa and VAST. VAST is a distributed, interactive database allowing you to do full-text search over gigabytes (that's only what's working right now, VAST aims for scanning petabytes!) of data in realtime, i.e., sub-second round-trip times. Matthias Vallentin gave a great talk about his design - purely based on libcppa actors! The flow-control needed to do the indexing in realtime (which btw is a constant stream of events) is build *on top* of actors. Not the other way around. Matthias tried a ZeroMQ design first, you might want to ask him how well that went... Having a low level of abstraction does by no means implies good performance or scalability. This is a fundamental design decision and I want people to write code on a sane, reasonable level of abstraction. If you can't reason about your code, it doesn't matter how "efficient" your building blocks are. The actor model is so appealing because it *takes away* the complexity of distributed runtime environments. And guess what? You can get insane performance out of actor systems with less headache. If you don't believe me that actor systems scale, go have a look at the selection of Production Users at http://akka.io/ and see for yourself. Those companies pick Scala and Java over C++ for *performance-critical applications* because of the actor model.
Boost.Actor implements its own network protocol, but you often need to integrate with an existing protocol, such as MQTT [7] or DDS [8].
You can integrate any network protocol by using brokers: http://neverlord.github.io/boost.actor/manual/#sec45 There's an example how to integrate Google Protobuf in libcppa: https://github.com/Neverlord/libcppa/tree/master/examples/remote_actors
We can add distribution by having proxies. The proxies can hide the details about routing (e.g. actors may change location due to load balancing or migration,) and network protocol.
That's exactly how it's done.
Library reuse -------------
Although Boost.Actor reuses other Boost libraries, it has implemented quite a lot that either exists in other Boost libraries, or that could be moved to those.
You have already mentioned that you do not use MPL, Serialization, and Asio, so I will not delve into these, other than saying that I believe that having your own socket implementation instead of using Boost.Asio is a show-stopper.
Aren't peer reviews about interface design, documentation, and testing? I cannot believe an implementation detail can be a show-stopper. To be quite frank, I just don't care about it. I do care about performance. As long as Asio delivers equal or better performance, I'll migrate sooner rather than later. But to me, this is an unimportant implementation detail.
Apart from these three, there are other libraries that should be considered. Boost.Actor has:
o Own continuations instead of Boost.Thread (future::then)
Futures don't deal with messages and know nothing about the scheduling in boost.actor. The syntax is similar, but the continuations used in boost.actor are syntactic sugar for the message pasing underneath.
o Own producer-consumer queue instead of Boost.Lockfree
The producer-consumer queue used in the scheduler ist based on an excellent Dr. Dobb's article of Herb Sutter [2] and performs reasonably well. Maybe there's interest in adding it to Boost.
o Own logging framework instead of Boost.Log, although I would prefer not having logging in a library at all.
As a user, you won't have logging. It's purely for debugging purposes and not compiled unless you define the macros to do so. It really should be in the detail namespace though.
o Own UUID instead of Boost.UUID
Can this library give me the UUID of the first hard drive? That's the only use case I have for this. The generators in the documentation don't mention anything like this.
o Own time duration instead of Boost.Chrono
The reason is the same why I don't ues std::chrono::duration: they are templated. I need a generic duration type that has the unit as member rather than as template parameter and also can be invalid. Maybe I could replace this with optionalstd::chrono::milliseconds in the future, though it would mean to hardcode the maximum resolution.
Then there are code that could be refactored to other Boost libraries so they can be used in other contexts. For example:
o Stacktrace dumper o RIPEMD hash function o MAC address
Agree, except for the stacktrace dumper. Unless someone else refactores it to work on Windows. I hope I could convince you that you are not requesting changes to boost.actor. What you want is a different library entirely. An actor *is* the fundamental building block, that's what actor programming is all about. [1] http://libcppa.blogspot.de/2011/04/mailbox-part-1.html [2] http://www.drdobbs.com/parallel/writing-a-generalized-concurrent-queue/21160...