Proposal: Library to manage processes

Hi everyone, a while ago, I started to write a little utility that aims to interact with multiple VCS systems. I chose C++ (and I don't regret that decision at all), so I had to create multiple classes to interact with system internals and keep the "real" application code clean from such OS-specific calls. A few days ago, I decided to switch to use Boost in that project, basically because I was interested in the Filesystem library. Doing so, I have been able to remove some of my custom classes that dealt with files and directories (and which were quite ugly). Furthermore, this change has allowed me to use many other Boost goodies. Among the classes I wrote, there are some that provide an abstraction layer to launch other (child) processes. Once the process is launched, the code can access its standard input/output/error streams by using the standard iostreams framework. I'm willing to reorganize (well, mostly rewrite from scratch) such process-management classes to follow the Boost policies, aiming for future integration in Boost. But, before I start to do so, I would like to know if such library will be adequate to be integrated and/or if there is interest in it. FWIW, I've searched the mailing lists and found some people that said that they missed this functionality in Boost. Any comments? Thanks in advance, -- Julio M. Merino Vidal <jmmv84@gmail.com> http://www.livejournal.com/users/jmmv/ The NetBSD Project - http://www.NetBSD.org/

Merino Vidal wrote:
a while ago, I started to write a little utility that aims to interact with multiple VCS systems. I chose C++ (and I don't regret that decision at all), so I had to create multiple classes to interact with system internals and keep the "real" application code clean from such OS-specific calls.
A few days ago, I decided to switch to use Boost in that project, basically because I was interested in the Filesystem library. Doing so, I have been able to remove some of my custom classes that dealt with files and directories (and which were quite ugly). Furthermore, this change has allowed me to use many other Boost goodies.
Among the classes I wrote, there are some that provide an abstraction layer to launch other (child) processes. Once the process is launched, the code can access its standard input/output/error streams by using the standard iostreams framework.
I'm willing to reorganize (well, mostly rewrite from scratch) such process-management classes to follow the Boost policies, aiming for future integration in Boost.
But, before I start to do so, I would like to know if such library will be adequate to be integrated and/or if there is interest in it. FWIW, I've searched the mailing lists and found some people that said that they missed this functionality in Boost.
I'm definitely interested. Regards Hartmut

Julio M. Merino Vidal wrote:
Among the classes I wrote, there are some that provide an abstraction layer to launch other (child) processes. Once the process is launched, the code can access its standard input/output/error streams by using the standard iostreams framework.
There is clearly a need for this sort of library. I'd like to see something like this follow the style of Boost.Threads where that makes sense. Besides support for starting processes, accessing their IO streams, and waiting for them to complete, I'd like to see support for interprocess synchronization. Also, it is extremely important that the interface does not hide features of the underlying OS interface, as is vogue for many irresponsible "frameworks." For example, if the OS allows processes to be suspended, but the basic interface doesn't because not all popular OS's possess this ability, the library MUST make provisions for a user to be able to access this ability through some sort of extension. This may mean an implementation that uses runtime polymorphism rather than concrete types. One very important implementation issue for this library is that it is aware of blocking operations. It's often forgotten that process-spawning code is blocking code due to its interaction with the filesystem. A black-box interface that might arbitrary block for a long time, without giving the user proper indication that this may happen, is unacceptable. Aaron

----- Original Message ----- From: "Julio M. Merino Vidal" <jmmv84@gmail.com>
But, before I start to do so, I would like to know if such library will be adequate to be integrated and/or if there is interest in it. FWIW, I've searched the mailing lists and found some people that said that they missed this functionality in Boost.
Any comments?
Thanks in advance,
I am definitely very interested! I would like to see this kind of library have good integration with Boost.Iostreams ( http://www.kangaroologic.com/iostreams/ ). I would also like to have operator overloading for redirection and piping operations, e.g. ofstream f("test.cpp", ios_base::out | ios_base::trunc) f << "#include <iostreams> int main() { std::cout << "hello world"; return 0; }"; f.close(); boost::process("gcc test.cpp"); stringstream s; boost::process("a.out") > s; assert(s.str() == "hello world"); I am already using this syntax this with my own ootl::filters library (which redirects functions instead of processes), and I would gladly help integrate the code (and make it more boost-worthy) with your library. The current version of the code is at http://www.ootl.org/ootl/filters/filters.hpp.htm and some examples of how I use it are at http://www.ootl.org/ootl/filters/filter_tests.hpp.htm Let me know if I can be of any help. -- Christopher Diggins Object Oriented Template Library (OOTL) http://www.ootl.org

On Sun, 20 Feb 2005 18:42:24 +0100, Julio M. Merino Vidal wrote
Hi everyone,
...snip...
Among the classes I wrote, there are some that provide an abstraction layer to launch other (child) processes. Once the process is launched, the code can access its standard input/output/error streams by using the standard iostreams framework.
I'm willing to reorganize (well, mostly rewrite from scratch) such process-management classes to follow the Boost policies, aiming for future integration in Boost.
This is a library I'm very interested in having in boost. Spawning processes is one of the most fundamental software integration techniques and I've had to write this in various ways over the years. Before you embark on the project you should be aware of some of the other libraries in the eco-system: http://libexecstream.sourceforge.net/ http://pstreams.sourceforge.net/ Also, ACE has tools for doing this, although the interface is quite complex and doesn't integrate with I/O streams. http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/classACE__Process__Ma... http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/classACE__Process.htm... And, of course, since you mention I/O stream integration (which I would expect), you will want to consider using the boost::iostream library. This library is now in CVS and will be part of 1.33. This will hopefully make the job easier. http://home.comcast.net/~jturkanis/iostreams/libs/iostreams/doc/
But, before I start to do so, I would like to know if such library will be adequate to be integrated and/or if there is interest in it. FWIW, I've searched the mailing lists and found some people that said that they missed this functionality in Boost.
Any comments?
For sure this is of interest, would love to see you take up the initiative to develop it. Jeff

On Sun, 20 Feb 2005 12:03:46 -0700, Jeff Garland wrote
Before you embark on the project you should be aware of some of the other libraries in the eco-system:
http://libexecstream.sourceforge.net/ http://pstreams.sourceforge.net/
Also, ACE has tools for doing this, although the interface is quite complex and doesn't integrate with I/O streams.
http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/classACE__Process__Ma...
http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/classACE__Process.htm...
And, of course, since you mention I/O stream integration (which I would expect), you will want to consider using the boost::iostream library. This library is now in CVS and will be part of 1.33. This will hopefully make the job easier.
http://home.comcast.net/~jturkanis/iostreams/libs/iostreams/doc/
One more reference. We had some discussion about something similar in mid-2004. AFAIK it never went anywhere: http://lists.boost.org/MailArchives/boost/msg06882.php Jeff

On Sun, 20 Feb 2005 16:17:22 -0700 "Jeff Garland" <jeff@crystalclearsoftware.com> wrote:
On Sun, 20 Feb 2005 12:03:46 -0700, Jeff Garland wrote
Before you embark on the project you should be aware of some of the other libraries in the eco-system: ... Also, ACE has tools for doing this, although the interface is quite complex and doesn't integrate with I/O streams.
Until I found boost about 4 years ago, ACE was my favorite place to learn about pragmatic, esoteric C++ thinking. So, concur on finding out "what's out there". BTW, was just looking at William Kempf's work w/ReadWrite locks; and who should show up prominently in the references: Doug Schmidt! Also, the suggestion to put together the lib doc in advance of the work makes total sense ... begin at the end. This is a valuable, non-trivial undertaking. I'm inclined to figure out how Boost/DocBook works so I can help. Somebody call me out on that! If we're going to manage processes; we ought to advantage ourselves by knowing whether the process is foreign (as in Julio's VC example) or one that is in our framework. One might say that this is mixing metaphors, managing processes vs being a process, but the ACE idea relates them. This is appropriate in lots of multi-daemon systems. So, to say that "it's quite complex and doesn't integrate with I/O streams" may be throwing the baby out with the bath water. (Though it is complex and..., and not nearly as tight as boost.) Otherwise, this is a cross platform packaging job on the pid_t fork_and_exec_piped(char **argv, int *infd, int *outfd, int *errfd) C function that we all have written. While this may be compelling in the context of Jonathan Turkanis "system_filter; system_source; system_sink" notion (and I don't write that off); the "process control" picture is bigger. There's the SIGCHLD handler issue (see Jeff Garland's archive reference). And the Runnable base class; which mates threads and processes in their derivations (and is more highly evolved than the Java counterpart). Consider "being a process". To get away from "C" main as fast as possible: class PTest : public app_base //derived from runnable int main(int argc, char **argv) { return main_delegate<App>(argc, argv); } app_base offers register_sighandler(SIGINT, &exit_requested, &sig_handler<App>) and set ability for private members std::istream& appin std::ostream& appout std::ostream& apperr I would have written boost::stdin, but I'm a slow dinosaur, and can't write it until I read it. Regards, Mark

Mark Deric wrote:
If we're going to manage processes; we ought to advantage ourselves by knowing whether the process is foreign (as in Julio's VC example) or one that is in our framework. One might say that this is mixing metaphors,
I personally beleive this is a not a good goal. For one, to the lesser extent Boost resembles a framework, the better Boost will integrate with the world in general. More importantly, it shouldn't matter whether a process is a Boost process or not. A process represents a set of useful interfaces: IO streams, synchronization party, IPC peer, virtual memory access, etc. If it is useful for a process to present something Boost-specific in its inter-process interface, then let it; but this is only a facet of the whole interface of the process, and does not make the process suddenly somehow become intrinsicly a "Boost process." Please do not make the "framework" mistake. Aaron W. LaFramboise

Jeff Garland wrote:
One more reference. We had some discussion about something similar in mid-2004. AFAIK it never went anywhere:
It did go places, but I concentrated on creating a parse_pseudo_command_line utility rather than on creating the library: child::spawn_data child_data = child::parse_pseudo_command_line("ls ../*/*.cpp > files.txt"); See http://www.devel.lyx.org/~leeming/libs/child/doc/html/child.user_guide.html I'm firmly of the opinion that such a utility, supporting all of the rules of some common language (in my case the Bourne shell), would be a very important addition to any Boost.Child package. I've recently picked up the baton of what the library itself would need and started trying to discuss the possible alternatives. See here: http://www.devel.lyx.org/~leeming/process/ It seems to me that the hardest part of writing such a library isn't the code to spawn the process in the first place at all. Rather it's the code to monitor the status of a running process and to notify the rest of the code when it has exited. I suspect that an Alexandrescu-style policy-based design will be needed, but with a twist. Once a singleton process_monitor variable is created, invocation of any other policy should become an error. Regards, Angus

Angus Leeming wrote:
I've recently picked up the baton of what the library itself would need and started trying to discuss the possible alternatives. See here:
http://www.devel.lyx.org/~leeming/process/
It seems to me that the hardest part of writing such a library isn't the code to spawn the process in the first place at all. Rather it's the code to monitor the status of a running process and to notify the rest of the code when it has exited. I suspect that an Alexandrescu-style policy-based design will be needed, but with a twist. Once a singleton process_monitor variable is created, invocation of any other policy should become an error.
Could you elaborate a bit on how the policy-based design would look? In particular, are you saying it would resemble the Loki singleton, or just that it would involve policies of some sort? Also, I'm not sure what you mean by invoking the policies. Best, Jonathan

Jonathan Turkanis wrote:
Angus Leeming wrote:
I've recently picked up the baton of what the library itself would need and started trying to discuss the possible alternatives. See here:
http://www.devel.lyx.org/~leeming/process/
It seems to me that the hardest part of writing such a library isn't the code to spawn the process in the first place at all. Rather it's the code to monitor the status of a running process and to notify the rest of the code when it has exited. I suspect that an Alexandrescu-style policy-based design will be needed, but with a twist. Once a singleton process_monitor variable is created, invocation of any other policy should become an error.
Could you elaborate a bit on how the policy-based design would look? In particular, are you saying it would resemble the Loki singleton, or just that it would involve policies of some sort? Also, I'm not sure what you mean by invoking the policies.
Hi, Jonathan. Ok, here goes. Aaron LaFramboise (http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?action=browse&diff=3&id=Multiplexing/AaronsMultiplexingIdeas Equivalent URL: http://tinyurl.com/5q7ch) and Hugo Duncan (http://giallo.sf.net) both talk about a "demultiplexor" to monitor the state of all processes that have been spawned by the parent. Such a demultiplexor (perhaps "monitor" is a clearer name?) can use one of several different mechanisms to monitor these processes. For example: * It could poll the state of all registered processes explicitly when requested to do so. * It could install a signal handler for SIGCHLD signals and update a local store of completed children. * It could spawn a thread to watch the state of a single child, this thread returning when the child exists. No one of these policies is perfect, so it makes sense to provide all three as policies. Similarly, how should such a demultiplexor inform interested parties that the child has completed? The obvious way is to store a Boost.Signal and emit it on process exit. But Boost.Signal is noncopyable, so it must be hidden inside some wrapper class with pointer-like copy semantics. More importantly, it cannot be used to communicate across threads. So, Boost.Signals won't cut it in a multithread world but is perfect in the singlethreaded world. Conclusion: the process demultiplexor should be a template with policies for the polling and for the callbacks. However, the demultiplexor should also be a singleton. Probably. (I imagine a signleton with a Zombie lifetime policy; it can be resurrected once it has died but will not emit signals.) Moreover, once you've registered one 'flavour' of demultiplexor, all other flavours should be forbidden. I think this can be achieved with a child_config.h that is #included by all other child source files and which the user must use to specify the demultiplexor. Does this make sense now? Angus

Angus Leeming wrote:
Jonathan Turkanis wrote:
Could you elaborate a bit on how the policy-based design would look? In particular, are you saying it would resemble the Loki singleton, or just that it would involve policies of some sort? Also, I'm not sure what you mean by invoking the policies.
Hi, Jonathan.
Ok, here goes.
<snip detailed explanation>
Does this make sense now?
Yes, thanks. I guess I just lacked the relevant context. I haven't followed all the demultiplexing discussions even though I am quite interested in it.
Angus
Jonathan

On Sun, Feb 20, 2005 at 04:17:22PM -0700, Jeff Garland wrote:
On Sun, 20 Feb 2005 12:03:46 -0700, Jeff Garland wrote
Before you embark on the project you should be aware of some of the other libraries in the eco-system:
http://libexecstream.sourceforge.net/ http://pstreams.sourceforge.net/
Also, ACE has tools for doing this, although the interface is quite complex and doesn't integrate with I/O streams.
http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/classACE__Process__Ma...
http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/classACE__Process.htm...
And, of course, since you mention I/O stream integration (which I would expect), you will want to consider using the boost::iostream library. This library is now in CVS and will be part of 1.33. This will hopefully make the job easier.
http://home.comcast.net/~jturkanis/iostreams/libs/iostreams/doc/
One more reference. We had some discussion about something similar in mid-2004. AFAIK it never went anywhere:
Also http://www.basepath.com/aup/ex/classUx_1_1Process.html This is part of Ux, a POSIX wrapper for C++. jon -- "You can lead a horticulture but you can't make her think." - Dorothy Parker

Clearly this would be interesting to a lot of people. Since you've already written working code and presumably dealt with lots of practical issue of design and implementation, maybe it would be most helpful if you summarize your proposal in the form of a library documentation. This would make it much easier to discuss such a thing. This would be helpful because lots of people are going to see different applications for this and will have a lot to say. Someone is going to want to to look like threads Someone is going to want it to control asyncronous processes on networked machines Someone is going to twant to pass data via [filtered] streams Someone else is going to feel that's too inefficient etc. etc. Although this is looking simple right now. Its a big topic. Hang on to your hat. Robert Ramey

On Sun, 20 Feb 2005 14:39:50 -0800, Robert Ramey wrote
Clearly this would be interesting to a lot of people.
Since you've already written working code and presumably dealt with lots of practical issue of design and implementation, maybe it would be most helpful if you summarize your proposal in the form of a library documentation. This would make it much easier to discuss such a thing. This would be helpful because lots of people are
I agree -- this post has already received several responses indicating interest.
going to see different applications for this and will have a lot to say.
Someone is going to want to to look like threads Someone is going to want it to control asyncronous processes on networked machines Someone is going to twant to pass data via [filtered] streams Someone else is going to feel that's too inefficient etc. etc.
Although this is looking simple right now. Its a big topic. Hang on to your hat.
While I agree that this will likely happen, I, for one, will argue for simplicity. We won't get all these features and we don't need them. A blocking implementation can be trivially wrapped with threads -- write that as an example it doesn't need to be a feature of the library. The 'inefficient folks' should abandon C++ and go back to C. We don't have an async i/o design for standard streams and I don't think we should try to solve that with this library -- and anyone that complains about this should come forth with a proposal as far as I'm concerned.... Jeff

Julio M. Merino Vidal wrote:
Hi everyone,
Among the classes I wrote, there are some that provide an abstraction layer to launch other (child) processes. Once the process is launched, the code can access its standard input/output/error streams by using the standard iostreams framework.
But, before I start to do so, I would like to know if such library will be adequate to be integrated and/or if there is interest in it. FWIW, I've searched the mailing lists and found some people that said that they missed this functionality in Boost.
Hi, One feature I intend to implement soon for the iostreams library is a set of filters and devices for starting child porcesses and accessing their standard input and output. I got the idea from the "unix filters" presented by JC van Winkle and john van Krieken at the 2003 ACCU conference. (The paper is called "GNIRTS ESAC REWOL: Bringing UNIX filters to iostream.") I have been planning to implement it along the lines of the libexecstream library mentioned by Jeff Garland. The components I am planning to support are as follows: * system_filter - starts a child process and filters a character sequence by feeding it to the child process's standard input and reading the filtered sequence from the porcess's standard output. * system_source - starts a child process and reads data from its standard output * system_sink - starts a child process and writes data to its standard input Clearly, if a library like the one you propose make it into boost, I will use it to implement the above components. First I'd like to consider whether there are any features of such a library which cannot be adequeately exposed by the above three components. If not, we could simply work together to write the above components using your code as a starting point, and the library might even make it into the upcoming 1.33 release. Jonathan

Julio M. Merino Vidal wrote:
Any comments?
I'd like to add that I think a process library is important and would be quite useful for boost to have. Having dealt with running processes quite a lot, I'd say one of the most important features is providing a method of dealing with "misbehaving" processes. This may not seem obvious at first, but I learned this lesson the hard way several years ago. IMO, to be useful the library must provide a recovery mechanism for the case where a child process locks up and never produces any output or enters an infinite loop and never stops producing output. Here's some of the code I've written for dealing with processes: http://cvs.openwbem.org/cgi-bin/viewcvs.cgi/openwbem/src/common/OW_Exec.hpp?rev=1.27&content-type=text/vnd.viewcvs-markup and http://cvs.openwbem.org/cgi-bin/viewcvs.cgi/openwbem/src/common/OW_Exec.cpp?rev=1.45&content-type=text/vnd.viewcvs-markup Here is an interesting use case: In order to generate some entropy to seed a random number generator, lacking a better source, a lot of child processes are simultaneously executed and their output and timing are added to the entropy pool. All processes are given 10 seconds to run, and are terminated if they haven't exited. This may happen more often than not, since one of the commands being run is tcpdump, and if 100 packets aren't captured in the allotted time it will still be running. See http://cvs.openwbem.org/cgi-bin/viewcvs.cgi/openwbem/src/common/socket/OW_SSLCtxMgr.cpp?rev=1.42&content-type=text/vnd.viewcvs-markup specifically randomSourceCommands, RandomOutputGatherer and their use at the end of loadRandomness(). Given the Exec interface, it's quite simple to accomplish the task. It's also quite easy to integrate the code with iostreams, I've written a trivial streambuffer that uses the UnnamedPipe interface. -- Dan Nuffer

On 02/20/2005 10:35 PM, Dan Nuffer wrote: [snip]
Having dealt with running processes quite a lot, I'd say one of the most important features is providing a method of dealing with "misbehaving" processes. This may not seem obvious at first, but I learned this lesson the hard way several years ago. IMO, to be useful the library must provide a recovery mechanism for the case where a child process locks up and never produces any output or enters an infinite loop and never stops producing output.
By "locks up" did you mean deadlock as one instance of this. If so, would a descriptor_lock like that descibed here: http://groups-beta.google.com/group/comp.lang.c++.moderated/msg/c4c3bf55960e... be useful for recovery? What I had in mind was a table in each thread local storage which contains blocked locks with requested and acquired resouces. This could be used to detect cycles, and hence deadlocks.

On Sun, Feb 20, 2005 at 06:42:24PM +0100, Julio M. Merino Vidal wrote:
Among the classes I wrote, there are some that provide an abstraction layer to launch other (child) processes. Once the process is launched, the code can access its standard input/output/error streams by using the standard iostreams framework.
I'm very interested in this. I wrote a utility to do something similar, although currently the process creation code is done by the streambuf class. Angus Leeming separated the process-control code from the stream buffer code which I think was the basis for the code he's using in LyX now (is that right Angus?) Unfortunately I haven't had time to commit his changes to CVS so you can't see his code anywhere yet (sorry, Angus, I've been extremely busy at work and adding boost::shared_ptr to GCC). See http://pstreams.sf.net/ for the current code, and imagine that there is a separate "process" class that pstreambuf uses to create and control the child process. jon -- "Strange how potent cheap music is." - No�l Coward

On Mon, Feb 21, 2005 at 01:12:43PM +0000, Jonathan Wakely wrote:
On Sun, Feb 20, 2005 at 06:42:24PM +0100, Julio M. Merino Vidal wrote:
Among the classes I wrote, there are some that provide an abstraction layer to launch other (child) processes. Once the process is launched, the code can access its standard input/output/error streams by using the standard iostreams framework.
I'm very interested in this. I wrote a utility to do something similar, although currently the process creation code is done by the streambuf class. Angus Leeming separated the process-control code from the stream buffer code which I think was the basis for the code he's using in LyX now (is that right Angus?) Unfortunately I haven't had time to commit his changes to CVS so you can't see his code anywhere yet (sorry, Angus, I've been extremely busy at work and adding boost::shared_ptr to GCC).
See http://pstreams.sf.net/ for the current code, and imagine that there is a separate "process" class that pstreambuf uses to create and control the child process.
URL for the code (all in one header): http://cvs.sf.net/viewcvs.py/pstreams/pstreams/pstream.h jon -- "..." - Anon. [fnords present in original]

Jonathan Wakely wrote:
On Sun, Feb 20, 2005 at 06:42:24PM +0100, Julio M. Merino Vidal wrote:
Among the classes I wrote, there are some that provide an abstraction layer to launch other (child) processes. Once the process is launched, the code can access its standard input/output/error streams by using the standard iostreams framework.
I'm very interested in this. I wrote a utility to do something similar, although currently the process creation code is done by the streambuf class. Angus Leeming separated the process-control code from the stream buffer code which I think was the basis for the code he's using in LyX now (is that right Angus?)
No, I put this on the back burner. Now that the port of LyX to win32 is almost complete, it's finally back on the front burner as the final step in the process (no pun intended ;-)).
Unfortunately I haven't had time to commit his changes to CVS so you can't see his code anywhere yet (sorry, Angus, I've been extremely busy at work and adding boost::shared_ptr to GCC).
I understand. Real life intrudes here too.
See http://pstreams.sf.net/ for the current code, and imagine that there is a separate "process" class that pstreambuf uses to create and control the child process.
jon
Angus
participants (12)
-
Aaron W. LaFramboise
-
Angus Leeming
-
christopher diggins
-
Dan Nuffer
-
Hartmut Kaiser
-
Jeff Garland
-
Jonathan Turkanis
-
Jonathan Wakely
-
Julio M. Merino Vidal
-
Larry Evans
-
Mark Deric
-
Robert Ramey