[futures] Request for Comments

Hi, Hartmut and I wrote a small library which implements futures for C++. I uploaded the code to the boost-sandbox vault. If you find anything odd in there (like e.g. overloading operator T()), you have to blame me, if you find interesting stuff in there, you have to blame Hartmut. The most basic future is simple_future<T>. Its constructor takes a function which returns T and spawns a new thread executing this function. Afterwards you can call operator()() to block until the thread is done and get the result. int slow_fac(int n){ if(n == 0) return 0; if(n == 1) return 1; sleep(1); return n * fac(n - 1); } simple_future<int> f1 = bind(slow_fac, 4); ... cout << f1() << endl; //should print 24 if you have two different implementations of fac and you are only interested in the result of the faster implementation, you can combine two futures using '||'. int fast_fac(int n){ if(n == 0) return 0; if(n == 1) return 1; return n * fac(n - 1); } simple_future<int> f2 = bind(fast_fac, 4); future<int> f3 = f1 || f2; ... cout << f3() << endl; // should print 24 If the result type of the futures given to '||' differ, the result type of the new future is a variant over the types. double fac4(){ return (double)fast_fac(4); } future<variant<int, double> > f4 = f2 || double_[&fac4]; //double_[] creates a simple_future<double> Similarly, the '&&'-operator creates a tuple of the types of the individual futures and blocks until all futures are done. future<tuple<int, double> > f5 = f2 && double_[&fac4]; Warning: I overloaded operator T() for futures to get a nicer syntax. int fac_of_4 = f1; // <- this works! Thorsten

Thorsten Schuett <schuett@zib.de> writes:
Hartmut and I wrote a small library which implements futures for C++. I uploaded the code to the boost-sandbox vault. If you find anything odd in there (like e.g. overloading operator T()), you have to blame me, if you find interesting stuff in there, you have to blame Hartmut.
The most basic future is simple_future<T>. Its constructor takes a function which returns T and spawns a new thread executing this function. Afterwards you can call operator()() to block until the thread is done and get the result.
<snip> You asked for comments, so here goes: I find the apparently copious use of implicit conversions disturbing. Especially disturbing is the fact that futures seem to have both incoming and outgoing implicit conversions. -- Dave Abrahams Boost Consulting www.boost-consulting.com

On Tuesday 16 August 2005 15:23, David Abrahams wrote:
Thorsten Schuett <schuett@zib.de> writes:
Hartmut and I wrote a small library which implements futures for C++. I uploaded the code to the boost-sandbox vault. If you find anything odd in there (like e.g. overloading operator T()), you have to blame me, if you find interesting stuff in there, you have to blame Hartmut.
The most basic future is simple_future<T>. Its constructor takes a function which returns T and spawns a new thread executing this function. Afterwards you can call operator()() to block until the thread is done and get the result.
<snip>
You asked for comments, so here goes: I find the apparently copious use of implicit conversions disturbing. Especially disturbing is the Hartmut already strongly opposed to include the operator T() for futures. I guess that I have to remove it :-(
The implicit conversion in the other direction is a little bit harder to get rid of. The type representing futures composed using '||' is currently to complex to easily write it down. But I need a handle to this future to pass it around. I see two solutions: - we make the type less complex, but I am not sure whether that is possible. - we add an explicit conversion function and remove the implicit conversion. Thorsten

Thorsten Schuett <schuett@zib.de> writes:
The implicit conversion in the other direction is a little bit harder to get rid of. The type representing futures composed using '||' is currently to complex to easily write it down. But I need a handle to this future to pass it around.
?? I don't see any obvious relationship between operator || and implicit conversions. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
Hartmut and I wrote a small library which implements futures for C++. I uploaded the code to the boost-sandbox vault. If you find anything odd in there (like e.g. overloading operator T()), you have to blame me, if you find interesting stuff in there, you have to blame Hartmut.
The most basic future is simple_future<T>. Its constructor takes a function which returns T and spawns a new thread executing this function. Afterwards you can call operator()() to block until the thread is done and get the result.
<snip>
You asked for comments, so here goes: I find the apparently copious use of implicit conversions disturbing. Especially disturbing is the fact that futures seem to have both incoming and outgoing implicit conversions.
In direct discussions with Thorsten I strongly objected against having an operator T(), he want's to see the commenst on this list anyway... Related your mentioning of incoming implicit conversions: where do you think these may take place? Do I miss something? Regards Hartmut

"Hartmut Kaiser" <hartmut.kaiser@gmail.com> writes:
Related your mentioning of incoming implicit conversions: where do you think these may take place? Do I miss something?
simple_future<int> f1 = bind(slow_fac, 4); This won't work if the single-argument ctor for simple_future<int> used here is marked explicit, so I conclude that it is not. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
Related your mentioning of incoming implicit conversions: where do you think these may take place? Do I miss something?
simple_future<int> f1 = bind(slow_fac, 4);
This won't work if the single-argument ctor for simple_future<int> used here is marked explicit, so I conclude that it is not.
The constructor was intended to be non-explicit, just to allow for such implicit conversions from any nullary functor. Is that too dangerous? What's currently missing in the code is a concept check allowing to spot the assignement of other types more easily (right now the compilation fails somewhere deep inside the library if another type - other than a nullary functor - is used to initialise a simple_future). Regards Hartmut

"Hartmut Kaiser" <hartmut.kaiser@gmail.com> writes:
David Abrahams wrote:
Related your mentioning of incoming implicit conversions: where do you think these may take place? Do I miss something?
simple_future<int> f1 = bind(slow_fac, 4);
This won't work if the single-argument ctor for simple_future<int> used here is marked explicit, so I conclude that it is not.
The constructor was intended to be non-explicit, just to allow for such implicit conversions from any nullary functor. Is that too dangerous?
Not necessarily, by itself. However, you also have outgoing implicit conversions. The combination is very dangerous. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
David Abrahams wrote:
Related your mentioning of incoming implicit conversions: where do you think these may take place? Do I miss something?
simple_future<int> f1 = bind(slow_fac, 4);
This won't work if the single-argument ctor for simple_future<int> used here is marked explicit, so I conclude that it is not.
The constructor was intended to be non-explicit, just to allow for such implicit conversions from any nullary functor. Is that too dangerous?
Not necessarily, by itself.
I thought so :-P
However, you also have outgoing implicit conversions. The combination is very dangerous.
I agree with that. Please forget about the outgoing explicit conversions. These will be removed for sure. Regards Hartmut

Hi, I hope I can bother you with some general questions: 1) I've never heard the notion of a "future" in this context -- what does it mean? 2) How is the relation of your library to the boost thread library? I only once looked into boost::thread, and I came to the conclusion, that it wouldn't be usable for me, since the main application for which I want to use threads is as follows: Given two functions bool f1(X x); bool f2(X x); I want to compute the logical or (short-circuit!). Now boost::thread did not support thread cancellation or termination in version 1_32, and it seems that development of boost::thread has been cancelled itself. To cancel a thread is essential to gain the speed-up possible by the parallel short-circuit evaluation, and thus boost::thread seems not to be usable. Now how are you proceeding? In the FAQ's of boost::thread one can read ---- 9. Why isn't thread cancellation or termination provided? There's a valid need for thread termination, so at some point probably will include it, but only after we can find a truly safe (and portable) mechanism for this concept. ---- How do you cancel threads? Can one compute the short-circuit parallel or with your "futures" ?! 3) How is access to common storage handled with you approach?! (I would appreciate a short explanation using not too much jargon --- yet I have only written one multi-threaded program (and that to explore the boost::thread library).) Oliver

Oliver Kullmann wrote:
I hope I can bother you with some general questions:
1) I've never heard the notion of a "future" in this context -- what does it mean?
A Future represents the result of an asynchronous computation. Methods are provided to check if the computation is complete, to wait for its completion, and to retrieve the result of the computation. The result can only be retrieved using method get() or the operator()() when the computation has completed, blocking if necessary until it is ready. Cancellation is currently missing because of the missing boost::thread::kill(). That's an open issue with the current implementation.
2) How is the relation of your library to the boost thread library? I only once looked into boost::thread, and I came to the conclusion, that it wouldn't be usable for me, since the main application for which I want to use threads is as follows:
The futures library is implemented on the top of boost::thread.
Given two functions
bool f1(X x); bool f2(X x);
I want to compute the logical or (short-circuit!).
Now boost::thread did not support thread cancellation or termination in version 1_32, and it seems that development of boost::thread has been cancelled itself.
It does not support thread cancellation in V1.33.0 either.
To cancel a thread is essential to gain the speed-up possible by the parallel short-circuit evaluation, and thus boost::thread seems not to be usable.
Now how are you proceeding? In the FAQ's of boost::thread one can read
----
9. Why isn't thread cancellation or termination provided?
There's a valid need for thread termination, so at some point probably will include it, but only after we can find a truly safe (and portable) mechanism for this concept.
----
How do you cancel threads?
We don't cancel threads for the reasons outlined above. I'm not a threading expert but I think, that threads shouldn't be cancelled from the 'outside'. Threads should generally cancel itself. Once boost::thread comes up with a general solution cancellation will be added to the futures library as well.
Can one compute the short-circuit parallel or with your "futures" ?!
The expression: int func1(); int func2(); future<int> f = int_[&func1] || int_[&func2]; Creates a composite future (every of the functions is wrapped into a boost::thread), which when evaluated as int result = f(); either returns immediately if one of the functions (func1 or func2) already did return, or blocks until one of the functions return. The return value of f() is the value returned from the function, which finished first. Is that what you've had in mind by mentioning 'shortcut evaluation'?
3) How is access to common storage handled with you approach?! (I would appreciate a short explanation using not too much jargon --- yet I have only written one multi-threaded program (and that to explore the boost::thread library).)
That's completely up to you. The futures library does not care about, what the wrapped functions actually do. Regards Hartmut

Hi Hartmut, thanks for your reply.
Can one compute the short-circuit parallel or with your "futures" ?!
The expression:
int func1(); int func2(); future<int> f = int_[&func1] || int_[&func2];
Creates a composite future (every of the functions is wrapped into a boost::thread), which when evaluated as
int result = f();
either returns immediately if one of the functions (func1 or func2) already did return, or blocks until one of the functions return. The return value of f() is the value returned from the function, which finished first.
Is that what you've had in mind by mentioning 'shortcut evaluation'?
The problem here is, that the logical or can only be shortcircuit if one the the operands is true, in which case the output is true; but if one of the function returns false, then the return value is identical to the return value of the remaining function. To say it more precisely: Given bool f1(); bool f2(); an expression bool result = parallel_and_shortcircuit_or(f1, f2); is needed, such that the following happens: 1) The evaluation of parallel_and_shortcircuit_or(f1, f2) starts two independent threads for f1 and f2, and the main thread blocks. 2) As soon as f1 or f2 returns true, the following happens: a) result is set to true, and the main thread continues; b) say, f1 returned true: f2 then is cancelled (if still running), and so does not waste computing resources anymore. 3) Otherwise, both threads have to be completed, and result is set to false. Different to your solution above, we are happy with the evaluation of only f1 or f2 only in the case the function returned true; and I guess, due to the missing cancellation of threads, in your solution one thread will still continue to run (and using resources) even if result has been computed and the main thread is continuing, right? Oliver

Oliver Kullmann wrote:
thanks for your reply. [snip]
Different to your solution above, we are happy with the evaluation of only f1 or f2 only in the case the function returned true; and I guess, due to the missing cancellation of threads, in your solution one thread will still continue to run (and using resources) even if result has been computed and the main thread is continuing, right?
That's very much related to the idea Martin was talking about in his post wrt failing futures. We'll have to think about, how to allow such behaviour. The current implementation relies on the futures returning some 'real' result value, not a flag signing success. And yes, in our case (as long as we won't add some cooperative thread cancelling) the remaining thread will run until it#s finished. Regards Hartmut

On Wed, Aug 17, 2005 at 10:15:11AM -0500, Hartmut Kaiser wrote:
Oliver Kullmann wrote:
thanks for your reply. [snip]
Different to your solution above, we are happy with the evaluation of only f1 or f2 only in the case the function returned true; and I guess, due to the missing cancellation of threads, in your solution one thread will still continue to run (and using resources) even if result has been computed and the main thread is continuing, right?
That's very much related to the idea Martin was talking about in his post wrt failing futures. We'll have to think about, how to allow such behaviour. The current implementation relies on the futures returning some 'real' result value, not a flag signing success.
You are right, my case could be modelled using "failing futures". But the approach seems somewhat too special for me. Consider for example int f1(); int f2(); int result = run_parallel_and_evaluate_shortcircuit( f1() * f2() ); (syntax is not real, but I hope it's clear what I mean). Here if one of f1 or f2 returns 0 then result = 0 (and the other thread can be cancelled), while otherwise both f1 and f2 have to be computed. The underlying theme is, that an expression is to be evaluated which uses certain function calls f_i, and that under certain circumstances not all of the f_i have to be evaluated (but we can short-circuit the evaluation). With such a functionality one could also handle Martin's 'futures returns a result with quality == "best"'. The general mechanism could be as follows (using mathematical notation): create_thread_and_evaluate( (f_1, c_1), (f_2, c_2), ..., (f_n, c_n), e(y_1, ..., y_n) ); This expression starts n threads running f_1, ..., f_n. If f_i finishes with result y_i, then c_i(y_i) is called; c_i(y_i) has the possibility to cancel all the other threads and compute the result directly. Otherwise all results are collected and the value e(y_1, ..., y_n) is returned. The c_i are conditions which tell us under what circumstances the result of f_i alone is sufficient (and furthermore they compute the result in this case), while e(y_1, ..., y_n) is the default evaluation. Of course, there could be more complicated situation, where perhaps you have to compute 2 out of 3 values (but not all 3). This situation could be handled by passing to c_i the complete information which f_i have terminated yet with which values y_i --- c_i then either decides "that's enough" and handles the whole computation, or it says "so well, continue". Oliver

Oliver Kullmann wrote:
Given two functions
bool f1(X x); bool f2(X x);
I want to compute the logical or (short-circuit!).
Now boost::thread did not support thread cancellation or termination in version 1_32, and it seems that development of boost::thread has been cancelled itself. To cancel a thread is essential to gain the speed-up possible by the parallel short-circuit evaluation, and thus boost::thread seems not to be usable.
Now how are you proceeding? In the FAQ's of boost::thread one can read
----
9. Why isn't thread cancellation or termination provided?
There's a valid need for thread termination, so at some point probably will include it, but only after we can find a truly safe (and portable) mechanism for this concept.
----
How do you cancel threads?
The explanation in the FAQ doesn't explain the problem in details. Here's a brief example: bool f1(X x) { X * y = new X; // if f1() is aborted at this point // *y won't be deallocated delete y; return false; }
Can one compute the short-circuit parallel or with your "futures" ?!
This is a common problem in the world of parallel computations. It's due to the nature of parallel exectution and it's independent from boost::thread, futures or any library. Basically, the solution is always the same: f1() polls a flag periodically, and if it is raised it exits. To cancel the thread you just raise the flag and the thread terminates itself: class A { Flag cancel_f1; bool f1(X x) { X * y = new X; for (int i = 1; i < 1000; i++) { // some computations, e.g. one iteration of the algorithm if (flag.is_raised()) { delete y; return false; } } return actual_result; } } ... A a; a.run_f1_in_a_new_thread(); a.cancel_f1.raise();
3) How is access to common storage handled with you approach?! (I would appreciate a short explanation using not too much jargon --- yet I have only written one multi-threaded program (and that to explore the boost::thread library).)
It isn't handled at all. You should use boost::thread synchronization primitives to handle it properly when you implement f1(). Andrey

9. Why isn't thread cancellation or termination provided?
There's a valid need for thread termination, so at some point probably will include it, but only after we can find a truly safe (and portable) mechanism for this concept.
----
How do you cancel threads?
The explanation in the FAQ doesn't explain the problem in details. Here's a brief example:
bool f1(X x) { X * y = new X; // if f1() is aborted at this point // *y won't be deallocated delete y; return false; }
So we would get a memory leak. The underlying reason is that threads don't have their own address space, and thus the problem seems inherent to the notion of threads?!
Can one compute the short-circuit parallel or with your "futures" ?!
This is a common problem in the world of parallel computations. It's due to the nature of parallel execution and it's independent from boost::thread, futures or any library.
Basically, the solution is always the same: f1() polls a flag periodically, and if it is raised it exits.
To cancel the thread you just raise the flag and the thread terminates itself:
class A { Flag cancel_f1; bool f1(X x) { X * y = new X; for (int i = 1; i < 1000; i++) { // some computations, e.g. one iteration of the algorithm if (flag.is_raised()) { delete y; return false; } } return actual_result; } }
... A a; a.run_f1_in_a_new_thread(); a.cancel_f1.raise();
This polling-solution doesn't look like a solution for what I have in mind. First of all, it's a source of inefficiency, but much more important, it does not work with generic components: The component doing some computation should not need to worry about the circumstances under which the computation is performed. But the polling-solution forces each component to provide measures for its own abortion, and this looks like a design nightmare to me: The library I'm developing consists of many components solving some sort of "very hard computational problem" (for example AI ...). The components on their own as well as any combination should possibly run in a parallel computation. I don't believe that the polling-solution, effectively polluting every component, is suitable for these demands. Altogether, it seems that threads are not suitable at all what I want to achieve, but *processes* are right here. Perhaps I outline shortly for what purpose I need distributed computing: As I said, I'm developing a generic (actually, generative) library for solving hard problems (constraint satisfaction etc.). This library consists mainly of concepts and models, that is, data structures and algorithms. Due to the complicated problem domain, most of the time it's completely unknown which of the many competing algorithmic resources you should use. So I need to create an environment, where the user of the library can start several algorithms (whether on a single processor or using multiple processors doesn't matter), monitor them if needed (how much progressing is it making? perhaps we should abort it??), possibly getting a result from them, possibly aborting them. (The algorithms would all run independently, using their own data; they communicate only with the central control, and this in a quite restricted way. The main point is, that it's easy to run whatever I want in parallel and control it.) The main motivation for distributed computing (of course, if possible, not just on a single computer, but using the Internet ...) here is not just that I have multiple processors which I want to exploit, but equally (or even more) important is the ease with which alternative algorithmic resources can be exploited, possibly gaining a super-linear speed-up (quite common in this area --- even on a single processor partitioning the (large) search space into different parts and searching through them in parallel can be faster than doing it sequentially). Originally it seemed clear to me, that processes are the right tool here, while threads are not suitable. But then I didn't find any support in Boost for process control, only the boost thread library, and I wondered (hoped) that this could be used. But it seems that was just wishful thinking. So, to conclude, there seems to be no support in Boost out there, and my only chance there is to use the C library, and start forking ?!?! Looks like a tough fate to me. Now I hope I didn't bother the well-meaning reader too much, but it seemed necessary to me to outline for what actually I would need threads or whatever there is. I would be thankful for any comments, links ... Oliver

Oliver Kullmann <O.Kullmann@swansea.ac.uk> writes:
9. Why isn't thread cancellation or termination provided?
There's a valid need for thread termination, so at some point probably will include it, but only after we can find a truly safe (and portable) mechanism for this concept.
----
How do you cancel threads?
The explanation in the FAQ doesn't explain the problem in details. Here's a brief example:
bool f1(X x) { X * y = new X; // if f1() is aborted at this point // *y won't be deallocated delete y; return false; }
So we would get a memory leak.
It's not just memory leaks that are an issue. You generally get undefined behavior if the program continues without properly unwinding stack frames. See setjmp/longjmp.
The underlying reason is that threads don't have their own address space, and thus the problem seems inherent to the notion of threads?!
No, the problem is inherent to skipping destructors. "Everyone" in the threading community (including Mr. Butenhof, FWIW) seems to agree that asynchronous thread cancellation is pretty much untenable most of the time, and that something like exceptions that are initiated by polling (or calling routines that are designated "cancellation points") is the only safe way to cancel a thread.
Can one compute the short-circuit parallel or with your "futures" ?!
This is a common problem in the world of parallel computations. It's due to the nature of parallel execution and it's independent from boost::thread, futures or any library.
Basically, the solution is always the same: f1() polls a flag periodically, and if it is raised it exits.
To cancel the thread you just raise the flag and the thread terminates itself:
class A { Flag cancel_f1; bool f1(X x) { X * y = new X; for (int i = 1; i < 1000; i++) { // some computations, e.g. one iteration of the algorithm if (flag.is_raised()) { delete y; return false; } } return actual_result; } }
... A a; a.run_f1_in_a_new_thread(); a.cancel_f1.raise();
This polling-solution doesn't look like a solution for what I have in mind. First of all, it's a source of inefficiency, but much more important, it does not work with generic components: The component doing some computation should not need to worry about the circumstances under which the computation is performed.
If you believe that, you can't *possibly* believe that asynchronous cancellation is okay, because it makes the problems much worse.
But the polling-solution forces each component to provide measures for its own abortion, and this looks like a design nightmare to me: The library I'm developing consists of many components solving some sort of "very hard computational problem" (for example AI ...). The components on their own as well as any combination should possibly run in a parallel computation. I don't believe that the polling-solution, effectively polluting every component, is suitable for these demands.
It doesn't necessarily have to pollute every component. If, like most good libraries, your components are mostly exception-neutral, any system call that is allowed to throw an exception can be modified to throw cancellation exceptions, and everything will still work. Still, the issue of exactly how thread cancellation should work in C++ is the source of much contention. You might want to read through http://www.codesourcery.com/archives/c++-pthreads/thrd7.html#00278 and other threads on that list.
Altogether, it seems that threads are not suitable at all what I want to achieve, but *processes* are right here.
Possibly. Process cancellation is much less troublesome than thread cancellation, because any state usually disappears along with the process, and any broken invariants along with it. However, you still have to watch out for shared memory.
Perhaps I outline shortly for what purpose I need distributed computing:
As I said, I'm developing a generic (actually, generative) library for solving hard problems (constraint satisfaction etc.). This library consists mainly of concepts and models, that is, data structures and algorithms. Due to the complicated problem domain, most of the time it's completely unknown which of the many competing algorithmic resources you should use. So I need to create an environment, where the user of the library can start several algorithms (whether on a single processor or using multiple processors doesn't matter), monitor them if needed (how much progressing is it making? perhaps we should abort it??), possibly getting a result from them, possibly aborting them. (The algorithms would all run independently, using their own data; they communicate only with the central control, and this in a quite restricted way. The main point is, that it's easy to run whatever I want in parallel and control it.)
The main motivation for distributed computing (of course, if possible, not just on a single computer, but using the Internet ...) here is not just that I have multiple processors which I want to exploit, but equally (or even more) important is the ease with which alternative algorithmic resources can be exploited, possibly gaining a super-linear speed-up (quite common in this area --- even on a single processor partitioning the (large) search space into different parts and searching through them in parallel can be faster than doing it sequentially).
Originally it seemed clear to me, that processes are the right tool here, while threads are not suitable. But then I didn't find any support in Boost for process control, only the boost thread library, and I wondered (hoped) that this could be used. But it seems that was just wishful thinking.
You might want to look at the bottom of http://www.osl.iu.edu/research/pbgl/documentation/graph/index.html HTH, -- Dave Abrahams Boost Consulting www.boost-consulting.com

You might want to look at the bottom of http://www.osl.iu.edu/research/pbgl/documentation/graph/index.html
"Parallel Processing Library" http://www.osl.iu.edu/research/pbgl/documentation/parallel/ referenced there is exactly "a higher level framework for distributed computations" I suggested you to look at. Unfortunately, it offers very little additional abstraction over raw MPI2 C++ API so it's advantage is doubtful. I thought of library which offers more specialized work flow concepts. For example, often a distributed environment is used just because there are a lot of data to be processed, and each piece of data can be processed completely independently or without major cooperation between computational nodes. Examples are video rendering/processing (if it's not I/O bound), distributed C++ compilation, SETI@Home/other BOINC projects. Such tasks fit a model which I call "std::transform with a stateless functor". MPI has abstractions for this (like MPI_Scatter/Gather) but it's very limited, ineffective and practically useless in many circumstances. Different node speeds, different and non-predictable amount of resources needed to process each data unit lead to node stalls. And MPI_Scatter requires an in-memory array instead of just an input, a past-the-end and an output iterator. Andrey

Oliver Kullmann wrote:
9. Why isn't thread cancellation or termination provided?
There's a valid need for thread termination, so at some point probably will include it, but only after we can find a truly safe
(and portable) mechanism for this concept.
----
How do you cancel threads?
The explanation in the FAQ doesn't explain the problem in details. Here's a brief example:
bool f1(X x) { X * y = new X; // if f1() is aborted at this point // *y won't be deallocated delete y; return false; }
So we would get a memory leak. The underlying reason is that threads don't have their own address space, and thus the problem seems inherent to the notion of threads?!
I illustrated only a memory leak from C++ heap. Other kinds of memory and other resources can leak too. For example, temporary files won't be deleted (disk space leak), locks on shared objects won't be released (deadlocks), network sockets and other system handles won't be closed etc. It's not just memory leaks, it's resource leaks. If your code acquires resources it should free them. Or you should register all resources it acquires in an external entity and this external entity will free them for you if you forget to. NT kernel services and Apache resource manager are examples of such entities. If you have specific resources which aren't freed when you terminate a thread, one possible solution would be to write a special manager for these resources.
Can one compute the short-circuit parallel or with your "futures" ?!
This is a common problem in the world of parallel computations. It's due to the nature of parallel execution and it's independent from boost::thread, futures or any library.
Basically, the solution is always the same: f1() polls a flag periodically, and if it is raised it exits.
To cancel the thread you just raise the flag and the thread terminates itself:
class A { Flag cancel_f1; bool f1(X x) { X * y = new X; for (int i = 1; i < 1000; i++) { // some computations, e.g. one iteration of the algorithm if (flag.is_raised()) { delete y; return false; } } return actual_result; } }
... A a; a.run_f1_in_a_new_thread(); a.cancel_f1.raise();
This polling-solution doesn't look like a solution for what I have in mind.
Don't forget it's just a simple example. It can be made more elegant and efficient if you need it.
First of all, it's a source of inefficiency,
Poll rarely. There are a lot of ways to make this polling using a very little percentage of total computation time. Polling is used widely for different purposes for a long time, and a lot of efficient methods has been invented.
but much more important, it does not work with generic components: The component doing some computation should not need to worry about the circumstances under which the computation is performed.
From your posts I see that you are at early design phases for your distributed framework, and you don't see the whole picture yet. I understand your idea. But your code always has to comply with some requirements, even if it's generic. The requirement to terminate gracefully isn't that hard to implement. You'll face a lot of other problems and I bet that your components will inevitably have to worry about a lot of different other circumstances.
But the polling-solution forces each component to provide measures for its own abortion, and this looks like a design nightmare to me: The library I'm developing consists of many components solving some sort of "very hard computational problem" (for example AI ...). The components on their own as well as any combination should possibly run in a parallel computation. I don't believe that the polling-solution, effectively polluting every component, is suitable for these demands.
Altogether, it seems that threads are not suitable at all what I want to achieve, but *processes* are right here.
In general, processes aren't right either. They won't help you with all possible kinds of resources.
Perhaps I outline shortly for what purpose I need distributed computing:
As I said, I'm developing a generic (actually, generative) library for solving hard problems (constraint satisfaction etc.). This library consists mainly of concepts and models, that is, data structures and algorithms.
From my experience with MPI it looks like you should look at MPI clusters for your task. MPI is a framework for writing distributed applications. It's standard and widely used. I think that 99% of large computational systems/supercomputers are based on MPI API. MPI supports multiprocessor systems and multicomputer network clusters. It's common for a library for solving hard problems to use MPI for work distribution so IMO it would be a good choice because it enables interoperation of different libraries. For example take a look at MPICH2 http://www-unix.mcs.anl.gov/mpi/mpich2/ It's one of MPI implementations and I found it very easy to deploy.
Due to the complicated problem domain, most of the time it's completely unknown which of the many competing algorithmic resources you should use. So I need to create an environment, where the user of the library can start several algorithms (whether on a single processor or using multiple processors doesn't matter), monitor them if needed (how much progressing is it making? perhaps we should abort it??),
You see? Your components have to "provide measures" for reporting progress anyways. I think that integrating the abort poll into your progress report routine won't introduce any additional code pollution.
possibly getting a result from them, possibly aborting them. (The algorithms would all run independently, using their own data; they communicate only with the central control, and this in a quite restricted way. The main point is, that it's easy to run whatever I want in parallel and control it.)
The main motivation for distributed computing (of course, if possible, not just on a single computer, but using the Internet ...) here is not just that I have multiple processors which I want to exploit, but equally (or even more) important is the ease with which alternative algorithmic resources can be exploited, possibly gaining a super-linear speed-up (quite common in this area --- even on a single processor partitioning the (large) search space into different parts and searching through them in parallel can be faster than doing it sequentially).
The problem is that for such tasks your own high-level scheduler will be faster than multiple OS threads on single CPU. So you should rely on OS threads/processes or MPI processes carefully. When you introduce high-level domain-specific metrics into your work scheduler, you'll probably get better performance. But of course you'll still need support for multiple concurrent execution flows to be able to consume all available CPU power.
Originally it seemed clear to me, that processes are the right tool here, while threads are not suitable. But then I didn't find any support in Boost for process control, only the boost thread library, and I wondered (hoped) that this could be used. But it seems that was just wishful thinking.
So, to conclude, there seems to be no support in Boost out there, and my only chance there is to use the C library, and start forking ?!?! Looks like a tough fate to me.
No please :) To support execution over network you'll need to write a network daemon for work distribution and solve a lot of other problems. MPI supports all this out of the box and you don't need to reinvent the wheel. You definitely need MPI or a higher level framework for distributed computations. At the moment Boost doesn't provide such high-level abstractions.
Now I hope I didn't bother the well-meaning reader too much, but it seemed necessary to me to outline for what actually I would need threads or whatever there is.
I would be thankful for any comments, links ...
Please, don't use these "?!?!" and "??". It makes people think that you are screaming aggressively. Andrey

Andrey Melnikov wrote: [...]
How do you cancel threads?
You use pthread_cancel() and rely on mandatory cancel delivery to "shall occur" cancellation points. Victim threads shall also be prepared to "may occur" delivery for optional cancellation points and async-cancel regions (when cancellation is enabled, of course).
The explanation in the FAQ doesn't explain the problem in details. Here's a brief example:
bool f1(X x) { X * y = new X; // if f1() is aborted at this point // *y won't be deallocated delete y; return false; }
Your f1() is clearly async-cancel-UNSAFE as written. So what? regards, alexander.

Alexander Terekhov wrote:
Andrey Melnikov wrote: [...]
How do you cancel threads?
You use pthread_cancel() and rely on mandatory cancel delivery to "shall occur" cancellation points. Victim threads shall also be prepared to "may occur" delivery for optional cancellation points and async-cancel regions (when cancellation is enabled, of course).
The explanation in the FAQ doesn't explain the problem in details. Here's a brief example:
bool f1(X x) { X * y = new X; // if f1() is aborted at this point // *y won't be deallocated delete y; return false; }
Your f1() is clearly async-cancel-UNSAFE as written. So what?
The example just illustrates that simple asynchronous cancellation doesn't work in all cases. Threads should be aware of external cancellation and should be written properly to make the cancellation (synchronous or asynchronous) safe. Also remember we didn't spoke about pthread approach. pthread isn't the only thread abstraction library in the world. Andrey

Andrey Melnikov wrote: [...]
asynchronous cancellation doesn't work in all cases. Threads should be aware of external cancellation and should be written properly to make the cancellation (synchronous or asynchronous) safe.
Your comment is not bad addition to the FAQ.
Also remember we didn't spoke about pthread approach. pthread isn't the only thread abstraction library in the world.
Really? Illuminate me. TIA. regards, alexander.

Alexander Terekhov wrote:
Andrey Melnikov wrote: [...]
asynchronous cancellation doesn't work in all cases. Threads should be aware of external cancellation and should be written properly to make the cancellation (synchronous or asynchronous) safe.
Your comment is not bad addition to the FAQ.
What FAQ?
pthread isn't the
only thread abstraction library in the world.
Really? Illuminate me.
Are you kidding? There are TONS of these. OmniThread, Mozilla NSPR threads, Apache APR threads, Qt QThread These are cross platform ones, and there are also things like MFC CThread, Borland VCL TThread, .NET MC++ threads, private implementations like http://www.codeproject.com/threads/cthread.asp and many many private implementations: http://www.google.com/search?q=thread+library Andrey

Andrey Melnikov wrote: [...]
Your comment is not bad addition to the FAQ.
What FAQ?
C/C++ threading FAQ. [...]
OmniThread,
Well, this one says "Much of the abstraction consists of simple C++ object wrappers around pthread calls."
Mozilla NSPR threads, Apache APR threads, Qt QThread ...
Also remember there are TONS of amusing drek out there. regards, alexander.

Alexander Terekhov wrote:
Andrey Melnikov wrote: [...]
Your comment is not bad addition to the FAQ.
What FAQ?
C/C++ threading FAQ.
What is this?
[...]
OmniThread,
Well, this one says "Much of the abstraction consists of simple C++ object wrappers around pthread calls."
Mozilla NSPR threads, Apache APR threads, Qt QThread ...
Also remember there are TONS of amusing drek out there.
??? Andrey

"Thorsten Schuett" wrote:
Hartmut and I wrote a small library which implements futures for C++.
A note aside: David Kohlhoff who wrote asio library (http://asio.sourceforge.net/) also implemented futures on the top of his library. I am still at trying to understand the whole thing so I can offer only pointer. /Pavel

--- Pavel Vozenilek <pavel_vozenilek@hotmail.com> wrote:
A note aside: David Kohlhoff who
I've been called many things, but never David ;)
wrote asio library (http://asio.sourceforge.net/) also implemented futures on the top of his library.
I am still at trying to understand the whole thing so I can offer only pointer.
Just to clarify, asio does not include an implementation of futures. Instead, I have attached a couple of programs (to which Pavel was referring) that illustrate how asio might be used to implement them. Some points of note: - The demuxer class can easily be used as a thread pool to perform the work associated with a future (rather than spawning a thread just for a particular future). - A future need not just be used for waiting for the result of a single long-running function. It could also be used to wait for an asynchronous operation, or chain of asynchronous operations, to complete. My personal preference in futures is to have a reference counting handle->body implementation, where the future class itself would have little more than operations (or operators) for setting and waiting/getting. You would then have a separate function for creating a new future from a simple function or composed function object (rather than implicit conversion from a function object), e.g. template <typename Functor> future<typename result_of<Functor>::type> make_future(demuxer& d, Functor f); Other similar functions could be defined for creating futures that use a specially spawned thread to do the work, etc. This approach would also permit the use of futures on their own when needed (such as in the chain-of-async-operations use case). Cheers, Chris

I love the concept of futures and look forward to the completion of this library. However, do have one critisism. On 8/16/05, Thorsten Schuett <schuett@zib.de> wrote:
//double_[] creates a simple_future<double>
Forgive me if am missing the obvious, but is there a reason as to why you have double_ and int_, etc. with overloaded operator[] to create a simple_future as opposed to simply having a "make_future" function which automatically detects which instantiation of simple_future should be made based off of the return type of the function pointer/function object being passed. In other words, why have double_[ &double_func ]; int_[ &int_func ]; when you can simply have make_future( &double_func ); // detects the double return type and returns a simple_future< double > make_future( &int_func ); // detects the int return type and returns a simple_future< int > Having a make_future function gets rid of the need to explicitly state the return type of the function in the future's creation. Even still, if you actually want to be explicit, then why would you make the double_ and int_ types when the desired functionality already exists by just specifying the template arguments of simple_future? double_[ &double_func ] seems to offer no benefit over simple_future< double >( &double_func ), other than possibly less code to type. It seems to me like completely uneccessary code, especially since you have similar types for int and I'd imagine other built-ins. I don't really see an added benefit of these objects concerning the manner in which you are currently using them. -Matt Calabrese

Thorsten Schuett wrote:
future<int> f3 = f1 || f2;
We'll probably need more than operator ||. E.g. consider f1 and f2 implement operations that consume time and may fail, e.g.: future<whatever> x = query_google_and_check_links_returned || dig_in_local_incomplete_archive; The local archive is faster than searching something over the internet, but the search may fail. Having an or-operation that waits for the first function to return a *success* would be useful. In another reply, thread-cancellation was mentioned. We don't have thread-cancellation currently, but it would be nice if future<> would not only accept functors but also allow for an optional interface to be passed for signaling an abort request. A user would then be able to implement a mechanism that works form her/him. Maybe, an abort policy would be the right thing here. Regards, m Send instant messages to your online friends http://au.messenger.yahoo.com

On Wednesday 17 August 2005 13:39, Martin Wille wrote:
Thorsten Schuett wrote:
future<int> f3 = f1 || f2;
We'll probably need more than operator ||.
E.g. consider f1 and f2 implement operations that consume time and may fail, e.g.:
future<whatever> x = query_google_and_check_links_returned || dig_in_local_incomplete_archive;
The local archive is faster than searching something over the internet, but the search may fail. Having an or-operation that waits for the first function to return a *success* would be useful. Interesting, I never thought of a "failing" future. We could end up with something like the following:
simple_future<optional<int> > f1; simple_future<optional<double> > f2; future<optional<variant<int, double> > > f3 = f1 || f2; where an empty optional represents failure.
In another reply, thread-cancellation was mentioned. We don't have thread-cancellation currently, but it would be nice if future<> would not only accept functors but also allow for an optional interface to be passed for signaling an abort request. A user would then be able to implement a mechanism that works form her/him. Maybe, an abort policy would be the right thing here. This depends on the cancelation possibilities in boost.thread.
Thorsten

Thorsten Schuett wrote:
In another reply, thread-cancellation was mentioned. We don't have thread-cancellation currently, but it would be nice if future<> would not only accept functors but also allow for an optional interface to be passed for signaling an abort request. A user would then be able to implement a mechanism that works form her/him. Maybe, an abort policy would be the right thing here.
This depends on the cancelation possibilities in boost.thread.
No, it doesn't. The whole point is that the user is enabled to implement his own abort-signaling mechanism (e.g. by setting a flag that gets read by the running thread, or by writing to an additional file descriptor used in a call to select() in the code run by the thread). Regards, m Send instant messages to your online friends http://au.messenger.yahoo.com

Martin Wille wrote:
future<int> f3 = f1 || f2;
We'll probably need more than operator ||.
E.g. consider f1 and f2 implement operations that consume time and may fail, e.g.:
future<whatever> x = query_google_and_check_links_returned || dig_in_local_incomplete_archive;
The local archive is faster than searching something over the internet, but the search may fail. Having an or-operation that waits for the first function to return a *success* would be useful.
Interesting idea! Could we signal that by requesting the function to throw a special exception?
In another reply, thread-cancellation was mentioned. We don't have thread-cancellation currently, but it would be nice if future<> would not only accept functors but also allow for an optional interface to be passed for signaling an abort request. A user would then be able to implement a mechanism that works form her/him. Maybe, an abort policy would be the right thing here.
That is possible but requires a special functor interface. Will have to think about this. Regards Hartmut

Hartmut Kaiser wrote:
Martin Wille wrote:
future<int> f3 = f1 || f2;
We'll probably need more than operator ||.
E.g. consider f1 and f2 implement operations that consume time and may fail, e.g.:
future<whatever> x = query_google_and_check_links_returned || dig_in_local_incomplete_archive;
The local archive is faster than searching something over the internet, but the search may fail. Having an or-operation that waits for the first function to return a *success* would be useful.
Interesting idea! Could we signal that by requesting the function to throw a special exception?
I think that would be viable. Another approach would be to pass a combiner (similar to what Boost.Signals does) and to use optional<whatever> as return types. Regards, m Send instant messages to your online friends http://au.messenger.yahoo.com

On Wednesday 17 August 2005 16:06, Martin Wille wrote:
Hartmut Kaiser wrote:
Martin Wille wrote:
future<int> f3 = f1 || f2;
We'll probably need more than operator ||.
E.g. consider f1 and f2 implement operations that consume time and may fail, e.g.:
future<whatever> x = query_google_and_check_links_returned || dig_in_local_incomplete_archive;
The local archive is faster than searching something over the internet, but the search may fail. Having an or-operation that waits for the first function to return a *success* would be useful.
Interesting idea! Could we signal that by requesting the function to throw a special exception?
I think that would be viable. Another approach would be to pass a combiner (similar to what Boost.Signals does) and to use optional<whatever> as return types. As I wrote above, I would prefer using optional<T> in this case.
Thorsten

Thorsten Schuett wrote:
On Wednesday 17 August 2005 16:06, Martin Wille wrote:
I think that would be viable. Another approach would be to pass a combiner (similar to what Boost.Signals does) and to use optional<whatever> as return types.
optional<> would be insufficient if we don't have only success/failure but some quality indicator. We could return a result if one of the futures returns a result with quality == "best". Otherwise, we have to wait until all the futures involved have returned a result. (Something like that would be useful for running several heuristic algorithms in parallel with a suitable definition of "best") Regards, m Send instant messages to your online friends http://au.messenger.yahoo.com

On Wednesday 17 August 2005 17:53, Martin Wille wrote:
Thorsten Schuett wrote:
On Wednesday 17 August 2005 16:06, Martin Wille wrote:
I think that would be viable. Another approach would be to pass a combiner (similar to what Boost.Signals does) and to use optional<whatever> as return types.
optional<> would be insufficient if we don't have only success/failure but some quality indicator. We could return a result if one of the futures returns a result with quality == "best". Otherwise, we have to wait until all the futures involved have returned a result.
(Something like that would be useful for running several heuristic algorithms in parallel with a suitable definition of "best") How about adding an future-container + an iterator over this container? Probably we can use the 'or'-future as the container.
simple_future<Result> f1; ... simple_future<Result> fn; future<Result> f = f1 || ... || fn; Result result; for(future<Result>::iterator it = f.begin(); f.end() != it; ++it){ if(*it "better than" threshold){ result = *it; //f.cancel() ??? break; } } Then you could iterate over the future until your result is good enough. Using this interface everybody can create his own compositions. Thorsten

Hartmut Kaiser wrote:
Interesting idea! Could we signal that by requesting the function to throw a special exception?
I like the optional<T> return better; I think signals-style combiners might be better for more complicated results. Perhaps the signature for a worker function could be: optional<T> worker_function(bool (*continue_processing)(void)); where T is the result type of the simple_future object. Returning a non-empty optional indicates success, and the 'continue_processing' callback should allow the worker function to query back into the future object to test for whether its result is still awaited. This would allow the future object to do any concurrency management of the continuation flag. Matt

Thorsten Schuett wrote:
Hartmut and I wrote a small library which implements futures for C++. I uploaded the code to the boost-sandbox vault. If you find anything odd in there (like e.g. overloading operator T()), you have to blame me, if you find interesting stuff in there, you have to blame Hartmut.
What Thorsten forgot to mention is that the library is available in the boost-sandbox file vault: http://tinyurl.com/aw43j (file futures.zip) Regards Hartmut
participants (11)
-
Alexander Terekhov
-
Andrey Melnikov
-
Christopher Kohlhoff
-
David Abrahams
-
Hartmut Kaiser
-
Martin Wille
-
Matt Calabrese
-
Matt Vogt
-
Oliver Kullmann
-
Pavel Vozenilek
-
Thorsten Schuett