Re: [boost] Is there any interest in a library for actor programming? [preliminary submission]

17 May 2014

      On Sat, May 17, 2014 at 10:17:25AM +0200, Bjorn Reese wrote:
...
I have a couple of major concerns with the current submission, and I am
going to suggest some substantial changes. I hope that it does not
discourage you too much.
I am going to suggest that:
1. The library is broken into more fundamental building-blocks (which
    is what both Boost and the C++ standard is all about.)
 2. A more flexible data flow architecture is adopted.
 3. More use of existing Boost libraries.
I recognize three more fundamental building-blocks in the current
submission: active objects, messaging middleware, and data flow. I am
not against a higher level actor API, but the fundamentals need to be
in place first.
Thank you for taking your time for this thorough comment. However, I have to
say I disagree on many levels. First of all: C++ is not about having a low
level of abstraction. C++ is about having the highest level of abstraction
possible without sacrificing performance. What you are suggesting is to not
having an actor library in Boost. You want to have a low-level active object
library with low-level networking primitives.
...
Building-blocks
---------------
Boost.Actor implements a distributed mailbox-based actor model. While
this is a building-block to some users, it is not fundamental. It
conflates the actor with distribution.
An actor *is* the fundamental primitive! We are talking about an implementation
of the actor model. The whole point of the actor model is to have software
entities that abstract over physical deployment. Actors are *not* the same
thing as active objects. There is obviously overlap of these two models, but
the actor modle is more general, i.e., describes a higher level of abstraction.
...
I suggest that you start with a non-distributed actor model. This is
simply an active object with an incoming message queue. This can be
used its own right without distribution and mailboxes. Many applications
have classes with a working thread inside them, and active objects
should strive to replace these classes [1].
A non-distributed actor model? I think what you really want to say here is: you
want users to be able to use the lightweight actor implementation and the
work-stealing scheduler without having link-dependencies to networking
infrastructure they don't use. Can you agree on that? That's a fair point and
useful indeed. Well, here's the thing: there is nothing baked into the actor
primitive that would require such a dependency. The software design is fully
modular. Separate the middleman, move publish/remote_actor to a different
namespace, ship it separately, done. All you have to do to extend boost.actor
is to provide an implementation for actor_proxy. Behind the scenes, you can do
all kinds of stuff: networking, OpenCL-binding (that's how it's done in
licppa), you name it.
...
Active objects have two important variation points: scheduling and the
queue type. Regarding scheduling there is a C++ standards proposal that
should be considered [2]. There is a GSoC project about this [3]. For
the distributed case, boost::asio::io_service also has to be considered.
I know the Executor proposal and I think it's too basic to be useful for
anything other than implementing std::async. ThreadPool implementations in
general do work-sharing, whereas boost.actor implements work-stealing. The
latter yields superior performance in almost all use cases. 

I wouldn't mind moving the scheduler to it's own namespace to allow other
projects to build on top of that. The scheduler uses the interfaces resumable
and execution_unit, so there aren't any actor-specific types.
...
There is also work done on message queues. We already have some in
Boost.Lockfree, or the wait-free multi-producer queue in the
Boost.Atomic examples, as well as sync_queue in Boost.Thread.
Show me a queue that outperforms single_reader_queue and I'll take it. Keep in
mind though, that the enqueue operation uses only *a single* compare-and-swap
operation [1]. I don't see how you can outperform that. Did you have a look at
the performance evaluation? In particular the N:1 communication? This queue
scales up to 63 concurrent writers without a measurable perforance hit. I'm not
passionate about implementations details, though. Show me a queue that performs
even better in boost.actor and I'll take it.
...
Architecture
------------
Once we have got active objects, the question is how do we connect them?
The variation points here are routing and transmission.
Again, this library is not about active objects. It's about actors.
...
The mailbox approach is too simple for many applications. Partly
because it is too limited in some regards (e.g. push-only) and too
flexible in other regards (e.g. you cannot have fine-grained
access control or restricted visibility.)
There are several flow-based approaches that should be considered:
Boost.Iostreams has all the required concepts in place. There was a
Boost.Dataflow GSoC project [4] some years ago. There is a C++ standards
proposal [5] about C++ pipelines. See also the ZeroMQ guide [6] for
various examples.
All of that is true and it's damn good that libcppa/boost.actor it is the way
it is. I'm sorry, but again: this is an actor library. If you want to fiddle
with low-level networking, this is not the library you are looking for. There
was a talk at this year's C++Now about libcppa and VAST. VAST is a distributed,
interactive database allowing you to do full-text search over gigabytes (that's
only what's working right now, VAST aims for scanning petabytes!) of data in
realtime, i.e., sub-second round-trip times. Matthias Vallentin gave a great
talk about his design - purely based on libcppa actors! The flow-control needed
to do the indexing in realtime (which btw is a constant stream of events) is
build *on top* of actors. Not the other way around. Matthias tried a ZeroMQ
design first, you might want to ask him how well that went... Having a low
level of abstraction does by no means implies good performance or scalability. 

This is a fundamental design decision and I want people to write code on a
sane, reasonable level of abstraction. If you can't reason about your code, it
doesn't matter how "efficient" your building blocks are. The actor model is so
appealing because it *takes away* the complexity of distributed runtime
environments. And guess what? You can get insane performance out of actor
systems with less headache. If you don't believe me that actor systems scale,
go have a look at the selection of Production Users at http://akka.io/ and see
for yourself. Those companies pick Scala and Java over C++ for
*performance-critical applications* because of the actor model.
...
Boost.Actor implements its own network protocol, but you often need to
integrate with an existing protocol, such as MQTT [7] or DDS [8].
You can integrate any network protocol by using brokers:
http://neverlord.github.io/boost.actor/manual/#sec45

There's an example how to integrate Google Protobuf in libcppa:
https://github.com/Neverlord/libcppa/tree/master/examples/remote_actors
...
We can add distribution by having proxies. The proxies can hide the
details about routing (e.g. actors may change location due to load
balancing or migration,) and network protocol.
That's exactly how it's done.
...
Library reuse
-------------
Although Boost.Actor reuses other Boost libraries, it has implemented
quite a lot that either exists in other Boost libraries, or that could
be moved to those.
You have already mentioned that you do not use MPL, Serialization, and
Asio, so I will not delve into these, other than saying that I believe
that having your own socket implementation instead of using Boost.Asio
is a show-stopper.
Aren't peer reviews about interface design, documentation, and testing? I
cannot believe an implementation detail can be a show-stopper. To be quite
frank, I just don't care about it. I do care about performance. As long as Asio
delivers equal or better performance, I'll migrate sooner rather than later.
But to me, this is an unimportant implementation detail.
...
Apart from these three, there are other libraries that should be
considered. Boost.Actor has:
o Own continuations instead of Boost.Thread (future::then)
Futures don't deal with messages and know nothing about the scheduling in
boost.actor. The syntax is similar, but the continuations used in boost.actor
are syntactic sugar for the message pasing underneath.
...
o Own producer-consumer queue instead of Boost.Lockfree
The producer-consumer queue used in the scheduler ist based on an excellent Dr.
Dobb's article of Herb Sutter [2] and performs reasonably well. Maybe there's
interest in adding it to Boost.
...
o Own logging framework instead of Boost.Log, although I would
    prefer not having logging in a library at all.
As a user, you won't have logging. It's purely for debugging purposes and not
compiled unless you define the macros to do so. It really should be in the
detail namespace though.
...
o Own UUID instead of Boost.UUID
Can this library give me the UUID of the first hard drive? That's the only use
case I have for this. The generators in the documentation don't mention
anything like this.
...
o Own time duration instead of Boost.Chrono
The reason is the same why I don't ues std::chrono::duration: they are
templated. I need a generic duration type that has the unit as member rather
than as template parameter and also can be invalid. Maybe I could replace this
with optional<std::chrono::milliseconds> in the future, though it would mean to
hardcode the maximum resolution.
...
Then there are code that could be refactored to other Boost libraries
so they can be used in other contexts. For example:
o Stacktrace dumper
  o RIPEMD hash function
  o MAC address
Agree, except for the stacktrace dumper. Unless someone else refactores it to
work on Windows.

I hope I could convince you that you are not requesting changes to boost.actor.
What you want is a different library entirely. An actor *is* the fundamental
building block, that's what actor programming is all about. 

[1] http://libcppa.blogspot.de/2011/04/mailbox-part-1.html
[2] http://www.drdobbs.com/parallel/writing-a-generalized-concurrent-queue/21160...

Re: [boost] Is there any interest in a library for actor programming? [preliminary submission]

Dominik Charousset