[interprocess] Boost.Thread developers willing to help on interprocess condition vars?

Hi to all, Recently some bug reports were issued against Boost.Interprocess' condition variable. Since my multithreading skills are limited at best, I was wondering if Boost.Thread developers/threading experts would like to help on rewriting the condition variable. Let me explain some issues: Native vs. emulated condition vars ---------------------------------- Some POSIX systems have process-shared condition variables. In those systems interprocess_condition is a wrapper around pthread_cond_t. In the rest of platforms (including Windows and MacOS) an emulated condition variable is used. Emulated condition vars ----------------------- Emulated condition variables store integers and use atomic operations and busy waits/yields() to emulate process-shared condition variables. Reasons to use integers and busy waits are: -> Independent from address mapping. -> The condition variable should be compatible with memory mapped files. A user can map a file, build a condvar, unmap the file, reboot the system, map it again and continue working. Since there is no atomic operations library in Boost I use my own atomic operations (taken from apache): http://svn.boost.org/svn/boost/trunk/boost/interprocess/detail/atomic.hpp The emulated condition variable is here: http://svn.boost.org/svn/boost/trunk/boost/interprocess/sync/emulation/inter... So if you are a thread expert, Boost.Interprocess users and I need your help to write a robust process-shared condition variable emulation. Regards, Ion

Ion Gaztañaga <igaztanaga@gmail.com> writes:
Hi to all,
Recently some bug reports were issued against Boost.Interprocess' condition variable. Since my multithreading skills are limited at best, I was wondering if Boost.Thread developers/threading experts would like to help on rewriting the condition variable.
Sure, I'll help. Are you after a general rewrite, or just a fix for ticket 1231? On Windows, I'd have thought we could use process-shared sync objects, but that doesn't solve the general emulation problem. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony Williams wrote:
Ion Gaztañaga <igaztanaga@gmail.com> writes:
Hi to all,
Recently some bug reports were issued against Boost.Interprocess' condition variable. Since my multithreading skills are limited at best, I was wondering if Boost.Thread developers/threading experts would like to help on rewriting the condition variable.
Sure, I'll help. Are you after a general rewrite, or just a fix for ticket 1231?
Whatever you consider more appropriate. I pretty sure that instead of just doing yields, you could come up with a more efficient approach ;-) Now seriously, I think a complete rewrite by a threading expert would be more appropriate if the effort is assumable.
On Windows, I'd have thought we could use process-shared sync objects, but that doesn't solve the general emulation problem.
Does this support building those objects in mapped files, rebooting the system, mapping the file again again and continue working?
Anthony
Thanks for your help, Ion

Ion Gaztañaga wrote:
Does this support building those objects in mapped files, rebooting the system, mapping the file again again and continue working?
Is this guaranteed by POSIX? Which specific cases need to be supported? How would a condition that is rebooted in the middle of four threads waiting and one thread somewhere inside notify_one be able to resume operation?

Peter Dimov wrote:
Ion Gaztañaga wrote:
Does this support building those objects in mapped files, rebooting the system, mapping the file again again and continue working?
Is this guaranteed by POSIX? Which specific cases need to be supported? How would a condition that is rebooted in the middle of four threads waiting and one thread somewhere inside notify_one be able to resume operation?
At least in the classic "UNIX Network Programming, Volume 2, Second Edition: Interprocess Communications" (W. Richard Stevens) there is a chapter explaining POSIX message queues and an implementation using Memory mapped I/O and posix mutexes and condition variables. Whether this is officially supported by POSIX, I thought it was, but I can't find an explicit comment on this. Is there any POSIX expert out there that can solve this? It seems that Solaris + Linux and other UNIXes support it without problems. Book TOC excerpt taken from http://www.kohala.com/start/unpv22e/unpv22e.html Chapter 5.Posix Message Queues 75 5.1 Introduction 75 5.2 mq_open, mq_close, and mq_unlink Functions 76 5.3 mq_getattr and mq_setattr Functions 79 5.4 mq_send and mq_receive Functions 82 5.5 Message Queue Limits 86 5.6 mq_notify Function 87 5.7 Posix Realtime Signals 98 5.8 Implementation Using Memory-Mapped I/O 106 5.9 Summary 126 You can download the source code of the book (including the example of building mq_xxx using mmapped files and posix cond/mutexes) in that page. I've attached the code from that book for "mq_open" using pthread mutexes/conditions + memory mapped files. Regards, Ion /* include mq_open1 */ #include "unpipc.h" #include "mqueue.h" #include <stdarg.h> #define MAX_TRIES 10 /* for waiting for initialization */ struct mymq_attr defattr = { 0, 128, 1024, 0 }; mymqd_t mymq_open(const char *pathname, int oflag, ...) { int i, fd, nonblock, created, save_errno; long msgsize, filesize, index; va_list ap; mode_t mode; int8_t *mptr; struct stat statbuff; struct mymq_hdr *mqhdr; struct mymsg_hdr *msghdr; struct mymq_attr *attr; struct mymq_info *mqinfo; pthread_mutexattr_t mattr; pthread_condattr_t cattr; created = 0; nonblock = oflag & O_NONBLOCK; oflag &= ~O_NONBLOCK; mptr = (int8_t *) MAP_FAILED; mqinfo = NULL; again: if (oflag & O_CREAT) { va_start(ap, oflag); /* init ap to final named argument */ mode = va_arg(ap, va_mode_t) & ~S_IXUSR; attr = va_arg(ap, struct mymq_attr *); va_end(ap); /* 4open and specify O_EXCL and user-execute */ fd = open(pathname, oflag | O_EXCL | O_RDWR, mode | S_IXUSR); if (fd < 0) { if (errno == EEXIST && (oflag & O_EXCL) == 0) goto exists; /* already exists, OK */ else return((mymqd_t) -1); } created = 1; /* 4first one to create the file initializes it */ if (attr == NULL) attr = &defattr; else { if (attr->mq_maxmsg <= 0 || attr->mq_msgsize <= 0) { errno = EINVAL; goto err; } } /* end mq_open1 */ /* include mq_open2 */ /* 4calculate and set the file size */ msgsize = MSGSIZE(attr->mq_msgsize); filesize = sizeof(struct mymq_hdr) + (attr->mq_maxmsg * (sizeof(struct mymsg_hdr) + msgsize)); if (lseek(fd, filesize - 1, SEEK_SET) == -1) goto err; if (write(fd, "", 1) == -1) goto err; /* 4memory map the file */ mptr = mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (mptr == MAP_FAILED) goto err; /* 4allocate one mymq_info{} for the queue */ /* *INDENT-OFF* */ if ( (mqinfo = malloc(sizeof(struct mymq_info))) == NULL) goto err; /* *INDENT-ON* */ mqinfo->mqi_hdr = mqhdr = (struct mymq_hdr *) mptr; mqinfo->mqi_magic = MQI_MAGIC; mqinfo->mqi_flags = nonblock; /* 4initialize header at beginning of file */ /* 4create free list with all messages on it */ mqhdr->mqh_attr.mq_flags = 0; mqhdr->mqh_attr.mq_maxmsg = attr->mq_maxmsg; mqhdr->mqh_attr.mq_msgsize = attr->mq_msgsize; mqhdr->mqh_attr.mq_curmsgs = 0; mqhdr->mqh_nwait = 0; mqhdr->mqh_pid = 0; mqhdr->mqh_head = 0; index = sizeof(struct mymq_hdr); mqhdr->mqh_free = index; for (i = 0; i < attr->mq_maxmsg - 1; i++) { msghdr = (struct mymsg_hdr *) &mptr[index]; index += sizeof(struct mymsg_hdr) + msgsize; msghdr->msg_next = index; } msghdr = (struct mymsg_hdr *) &mptr[index]; msghdr->msg_next = 0; /* end of free list */ /* 4initialize mutex & condition variable */ if ( (i = pthread_mutexattr_init(&mattr)) != 0) goto pthreaderr; pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED); i = pthread_mutex_init(&mqhdr->mqh_lock, &mattr); pthread_mutexattr_destroy(&mattr); /* be sure to destroy */ if (i != 0) goto pthreaderr; if ( (i = pthread_condattr_init(&cattr)) != 0) goto pthreaderr; pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED); i = pthread_cond_init(&mqhdr->mqh_wait, &cattr); pthread_condattr_destroy(&cattr); /* be sure to destroy */ if (i != 0) goto pthreaderr; /* 4initialization complete, turn off user-execute bit */ if (fchmod(fd, mode) == -1) goto err; close(fd); return((mymqd_t) mqinfo); } /* end mq_open2 */ /* include mq_open3 */ exists: /* 4open the file then memory map */ if ( (fd = open(pathname, O_RDWR)) < 0) { if (errno == ENOENT && (oflag & O_CREAT)) goto again; goto err; } /* 4make certain initialization is complete */ for (i = 0; i < MAX_TRIES; i++) { if (stat(pathname, &statbuff) == -1) { if (errno == ENOENT && (oflag & O_CREAT)) { close(fd); goto again; } goto err; } if ((statbuff.st_mode & S_IXUSR) == 0) break; sleep(1); } if (i == MAX_TRIES) { errno = ETIMEDOUT; goto err; } filesize = statbuff.st_size; mptr = mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (mptr == MAP_FAILED) goto err; close(fd); /* 4allocate one mymq_info{} for each open */ /* *INDENT-OFF* */ if ( (mqinfo = malloc(sizeof(struct mymq_info))) == NULL) goto err; /* *INDENT-ON* */ mqinfo->mqi_hdr = (struct mymq_hdr *) mptr; mqinfo->mqi_magic = MQI_MAGIC; mqinfo->mqi_flags = nonblock; return((mymqd_t) mqinfo); /* $$.bp$$ */ pthreaderr: errno = i; err: /* 4don't let following function calls change errno */ save_errno = errno; if (created) unlink(pathname); if (mptr != MAP_FAILED) munmap(mptr, filesize); if (mqinfo != NULL) free(mqinfo); close(fd); errno = save_errno; return((mymqd_t) -1); } /* end mq_open3 */ mymqd_t Mymq_open(const char *pathname, int oflag, ...) { mymqd_t mqd; va_list ap; mode_t mode; struct mymq_attr *attr; if (oflag & O_CREAT) { va_start(ap, oflag); /* init ap to final named argument */ mode = va_arg(ap, va_mode_t); attr = va_arg(ap, struct mymq_attr *); if ( (mqd = mymq_open(pathname, oflag, mode, attr)) == (mymqd_t) -1) err_sys("mymq_open error for %s", pathname); va_end(ap); } else { if ( (mqd = mymq_open(pathname, oflag)) == (mymqd_t) -1) err_sys("mymq_open error for %s", pathname); } return(mqd); }

Ion Gaztañaga <igaztanaga@gmail.com> writes:
Peter Dimov wrote:
Ion Gaztañaga wrote:
Does this support building those objects in mapped files, rebooting the system, mapping the file again again and continue working? Is this guaranteed by POSIX? Which specific cases need to be supported? How would a condition that is rebooted in the middle of four threads waiting and one thread somewhere inside notify_one be able to resume operation?
At least in the classic "UNIX Network Programming, Volume 2, Second Edition: Interprocess Communications" (W. Richard Stevens) there is a chapter explaining POSIX message queues and an implementation using Memory mapped I/O and posix mutexes and condition variables.
Whether this is officially supported by POSIX, I thought it was, but I can't find an explicit comment on this. Is there any POSIX expert out there that can solve this? It seems that Solaris + Linux and other UNIXes support it without problems.
POSIX mutexes that have PTHREAD_PROCESS_SHARED "can be operated on by any thread that has access to the memory where the mutex is allocated, even if the mutex is allocated in memory that is shared by multiple processes". If that memory is backed by a memory-mapped file, then I guess one way to get access to the memory is to map the file. Even so, I wouldn't like to guarantee it. However, I would not expect the mutex to survive a reboot --- there might still be data in the file, but I would expect a process-shared mutex to have at least some portion that was a handle to a kernel object, and the kernel object would go away with a reboot. I expect the same to be true of condition variables. There is certainly nothing in the POSIX spec that says you can save a mutex to disk and read it back again. In fact, there is explicit wording in the rationale to suggest that some implementations may just store a pointer to the actual data structure --- in which case, writing it to disk and reading it back is not going to work, as the pointer will be invalid. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony Williams wrote:
Ion Gazta?aga <igaztanaga@gmail.com> writes:
Peter Dimov wrote:
Ion Gazta?aga wrote:
Does this support building those objects in mapped files, rebooting the system, mapping the file again again and continue working? [snip] However, I would not expect the mutex to survive a reboot --- there might still be data in the file, but I would expect a process-shared mutex to have at least some portion that was a handle to a kernel object, and the kernel object would go away with a reboot.
I have no idea what the POSIX spec requires or what most implementations actually do. But I believe that the Linux implementation, using the futex() system call, probably can survive a reboot as it has no permanent kernel component. The kernel only gets involved when a process finds the mutex locked, and needs to wait. See 'man 2 futex'. Of course the pthread_* functions, which wrap the futex() functionality, may make it impossible to exploit this behaviour. Phil.

"Phil Endecott" <spam_from_boost_dev@chezphil.org> writes:
Anthony Williams wrote:
Ion Gazta?aga <igaztanaga@gmail.com> writes:
Peter Dimov wrote:
Ion Gazta?aga wrote:
Does this support building those objects in mapped files, rebooting the system, mapping the file again again and continue working? [snip] However, I would not expect the mutex to survive a reboot --- there might still be data in the file, but I would expect a process-shared mutex to have at least some portion that was a handle to a kernel object, and the kernel object would go away with a reboot.
I have no idea what the POSIX spec requires or what most implementations actually do. But I believe that the Linux implementation, using the futex() system call, probably can survive a reboot as it has no permanent kernel component. The kernel only gets involved when a process finds the mutex locked, and needs to wait. See 'man 2 futex'. Of course the pthread_* functions, which wrap the futex() functionality, may make it impossible to exploit this behaviour.
I can see that working. Actually, my idea for using a UUID on Windows would work the same. However, this would only work for unlocked mutexes --- a locked mutex would potentially remain locked, but with no clear owner, or it might be unlocked, depending on how the state was stored. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony Williams wrote:
However, this would only work for unlocked mutexes --- a locked mutex would potentially remain locked, but with no clear owner, or it might be unlocked, depending on how the state was stored.
Well, leaving a mutex locked in a file does not seem a very good idea. That would indicate that the process that has acquired the mutex has died, so I would assume the mapped file is corrupted. If we can make this work for locked mutexes would be good enough IMHO. Regards, Ion

Ion Gaztañaga wrote:
Anthony Williams wrote:
However, this would only work for unlocked mutexes --- a locked mutex would potentially remain locked, but with no clear owner, or it might be unlocked, depending on how the state was stored.
Well, leaving a mutex locked in a file does not seem a very good idea. That would indicate that the process that has acquired the mutex has died, so I would assume the mapped file is corrupted. If we can make this work for locked mutexes would be good enough IMHO.
Correction: Support for *unlocked* mutexes would be good enough. Regards, Ion

Anthony Williams wrote:
However, I would not expect the mutex to survive a reboot --- there might still be data in the file, but I would expect a process-shared mutex to have at least some portion that was a handle to a kernel object, and the kernel object would go away with a reboot.
I expect the same to be true of condition variables. There is certainly nothing in the POSIX spec that says you can save a mutex to disk and read it back again. In fact, there is explicit wording in the rationale to suggest that some implementations may just store a pointer to the actual data structure --- in which case, writing it to disk and reading it back is not going to work, as the pointer will be invalid.
Even if POSIX does not officially guarantee it, I think filesystem persistence is very useful for mutexes, like the message queue example shows. Interprocess users are pretty happy with it. Until now all systems I've found that support process-shared mutexes do support the mapped files option. I wouldn't request your help if the task was easy ;-)
Anthony
Regards, Ion

Ion Gaztañaga <igaztanaga@gmail.com> writes:
Anthony Williams wrote:
Ion Gaztañaga <igaztanaga@gmail.com> writes:
Hi to all,
Recently some bug reports were issued against Boost.Interprocess' condition variable. Since my multithreading skills are limited at best, I was wondering if Boost.Thread developers/threading experts would like to help on rewriting the condition variable.
Sure, I'll help. Are you after a general rewrite, or just a fix for ticket 1231?
Whatever you consider more appropriate. I pretty sure that instead of just doing yields, you could come up with a more efficient approach ;-) Now seriously, I think a complete rewrite by a threading expert would be more appropriate if the effort is assumable.
I'll see what I can come up with. If there are no native process-shared sync objects, I'd be hard-pushed to think of anything better than yielding.
On Windows, I'd have thought we could use process-shared sync objects, but that doesn't solve the general emulation problem.
Does this support building those objects in mapped files, rebooting the system, mapping the file again again and continue working?
Sort-of. Named mutexes (for example) can be shared across processes --- every process that tries to create a mutex with a specified name gets the same mutex. If you store the name in a file then you'll get the same mutex back when you read back the file (though it will always be unlocked after a reboot). If use a UUID as the name, then you know it won't clash with anything else. What are the requirements with respect to memory mapped files? Does the locked state (and thus owner) of the mutex need to be sticky? How does this work with a process-shared pthread_mutex_t? Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Ion Gazta?aga wrote:
Since there is no atomic operations library in Boost I use my own atomic operations (taken from apache):
http://svn.boost.org/svn/boost/trunk/boost/interprocess/detail/atomic.hpp
It would be great to make this a publicly-visible library, rather than an implementation detail of interprocess. That would avoid duplication of effort in e.g. shared_ptr. Note that recent-ish versions of gcc have builtins for atomic operations on many platforms. Phil.

Phil Endecott wrote:
Ion Gazta?aga wrote:
Since there is no atomic operations library in Boost I use my own atomic operations (taken from apache):
http://svn.boost.org/svn/boost/trunk/boost/interprocess/detail/atomic.hpp
It would be great to make this a publicly-visible library, rather than an implementation detail of interprocess. That would avoid duplication of effort in e.g. shared_ptr.
Note that recent-ish versions of gcc have builtins for atomic operations on many platforms.
Have you seen Matt Calabrese's work? http://lists.boost.org/Archives/boost/2007/07/124651.php - Michael Marcin

Michael Marcin wrote:
Phil Endecott wrote:
Ion Gazta?aga wrote:
Since there is no atomic operations library in Boost I use my own atomic operations (taken from apache):
http://svn.boost.org/svn/boost/trunk/boost/interprocess/detail/atomic.hpp
It would be great to make this a publicly-visible library, rather than an implementation detail of interprocess. That would avoid duplication of effort in e.g. shared_ptr.
Note that recent-ish versions of gcc have builtins for atomic operations on many platforms.
Have you seen Matt Calabrese's work?
No I hadn't seen that; thanks for pointing it out. While trying to find the 'asm', or whatever it uses, I must say that I have never seen quite so many levels of #includes! I have spent a while looking but I can't find what's actually at the bottom of it all... Based on my recent experience of adding ARM asm to shared_ptr, all that I would want from an Atomics library is something that wraps the gcc builtins when they exist, otherwise falling back to something like Ion's existing apache-based code - with some macros or other indication of whether the implementation for each function is asm or a more expensive emulation. There may well be applications for a library like Matt's, but I would prefer it to be implemented on top of a more primitive, but accessible, layer. Regards, Phil.

Phil Endecott wrote:
Ion Gazta?aga wrote:
Since there is no atomic operations library in Boost I use my own atomic operations (taken from apache):
http://svn.boost.org/svn/boost/trunk/boost/interprocess/detail/atomic.hpp
It would be great to make this a publicly-visible library, rather than an implementation detail of interprocess. That would avoid duplication of effort in e.g. shared_ptr.
Note that recent-ish versions of gcc have builtins for atomic operations on many platforms.
I absolutely agree. Interprocess will suffer big portability problems with this approach, but the only atomic functions I've seen officially in Boost are those implemented in shared_ptr. Ideally, we should have some basic atomic library/internal header (even if all atomic functions issues a full barrier on each operation) ASAP. I really don't want to maintain something that I don't understand ;-)
Phil.
Regards, Ion
participants (5)
-
Anthony Williams
-
Ion Gaztañaga
-
Michael Marcin
-
Peter Dimov
-
Phil Endecott