boost::interprocess shared memory performance
This is my first experience with using shared memory for anything more than trivial IPC. Thanks again to Ion Gaztañaga for getting the library working on FreeBSD. I'm developing initially on OS-X 10.5.6 with boost_trunk, but the eventual target platform is FreeBSD--I'm just an old mac programmer used to cushy development tools. I originally used managed_mapped_file, and got everything working, but I was disappointed by performance. Profiling with shark, I found that the application was spending a lot of time in msync, which according to man(2) is used to synchronize mapped memory with the filesystem. It makes sense to me that the interprocess containers I was creating could have a lot of file I/O overhead, and managed_mapped_file was probably not a good choice. So I rewrote the code to use managed_shared_memory instead of managed_mapped_file, thinking that it would eliminate the file I/O and therefore be faster. However, I am surprised that it is not much faster, and when I profile with Shark on os-x I see that it is still spending a lot of time in msync, specifically whenever a managed_shared_memory object is destroyed. (in boost::interprocess::mapped_region::flush(unsigned long, unsigned long), which is called within basic_managed_shared_memory's destructor) Does managed_shared_memory really need to call msync? I see that I should optimize my code to cache the managed_shared_memory objects so that fewer create/deletes are necessary, but this is still going to happen fairly frequently and I wonder if this expensive msync call is necessary. In the tradition of coder forums everywhere, someone will probably ask what I'm trying to accomplish and whether there may be a better way. Suggestions welcome. I'm writing a little cgi driven database utility that queries data stored in a filesystem directory using a simple query language. I would like to keep indexes of the data to speed query resolution. The utility is old-school cgi, so all its resources (such as indexes) have to be instantiated into memory each time the cgi process is started. I could write indexes to files, but then I incur a de/serialization overhead that is expensive. My intention was to keep the indexes as ready-to-use interprocess::maps in shared memory, to be used by all invocations of the cgi. It works, but the performance of the shared memory is poor enough that I'm not getting much increase over just doing a brute force search through the datafiles. All suggestions appreciated! Andy
Andy Wiese wrote:
Does managed_shared_memory really need to call msync?
I don't know. Maybe even managed_mapped_file shouldn't call flush() in the destructor, because the OS should handle the changes made to the memory segment, perhaps maintaining data in memory. The question is maybe if closing a file should call fflush() and interprocess should do the same. Anyway, it is possible that unmap provokes implicitly a msync. Can you try to comment out the call to flush() in mapped_region's destructor and measure it again? And just a question, if you bottleneck is msync, this means that you are creating and destroying a lot of managed_shared_memory / managed_mapped_file instances? That does not seem very performance-friendly, since you will be mapping and unmapping pages, which is not a lightweight operation.
I'm writing a little cgi driven database utility that queries data stored in a filesystem directory using a simple query language. I would like to keep indexes of the data to speed query resolution. The utility is old-school cgi, so all its resources (such as indexes) have to be instantiated into memory each time the cgi process is started. I could write indexes to files, but then I incur a de/serialization overhead that is expensive. My intention was to keep the indexes as ready-to-use interprocess::maps in shared memory, to be used by all invocations of the cgi. It works, but the performance of the shared memory is poor enough that I'm not getting much increase over just doing a brute force search through the datafiles.
Ok, try to comment out flush() call and tell me if the difference is appreciable. Regards, Ion
On Sun, Dec 28, 2008 at 01:02:22PM +0100, Ion Gaztañaga wrote:
Andy Wiese wrote:
Does managed_shared_memory really need to call msync?
I don't know. Maybe even managed_mapped_file shouldn't call flush() in
Sorry to jump into the discussion. Here's a quote from the manual: msync() flushes changes made to the in-core copy of a file that was mapped into memory using mmap(2) back to disk. Without use of this call there is no guarantee that changes are written back before mun- map(2) is called.
like to keep indexes of the data to speed query resolution. The utility is old-school cgi, so all its resources (such as indexes) have to be instantiated into memory each time the cgi process is started. I could
Have you (Andy) considered to use FastCGI? It's still a regular CGI, just that it's a long-running process instead of a one-shot process. So the overheads will be amortized over several requests.
On Dec 28, 2008, at 7:18 AM, Zeljko Vrba wrote:
On Sun, Dec 28, 2008 at 01:02:22PM +0100, Ion Gaztañaga wrote:
Andy Wiese wrote:
Does managed_shared_memory really need to call msync?
I don't know. Maybe even managed_mapped_file shouldn't call flush() in
Sorry to jump into the discussion. Here's a quote from the manual:
msync() flushes changes made to the in-core copy of a file that was mapped into memory using mmap(2) back to disk. Without use of this call there is no guarantee that changes are written back before mun- map(2) is called.
Poking around under the hood, I discover that I may have been naive about shared_memory_object. It appears that on os-x and freebsd, shared_memory_object is implemented as a file in the filesystem. I noticed this because shared_memory_object::remove was returning an error condition on FreeBSD, so I looked a little deeper at ::remove on both platforms and see that on os-x it simply removes a file in a tmp directory, and on FreeBSD it calls shm_unlink, about which the man page says that POSIX shared memory objects are implemented as files. So iiuc, on my two target platforms at least, there is no fundamental difference between managed_mapped_file and managed_shared_memory. I should not expect to see any fundamental performance difference between them, and the msync call in question is probably correct. Someone please correct me if I'm mistaken. My previous experience with shared memory IPC has been with shmget and its family. If those area also implemented as files, it has never mattered to me and I haven't noticed.
like to keep indexes of the data to speed query resolution. The utility is old-school cgi, so all its resources (such as indexes) have to be instantiated into memory each time the cgi process is started. I could
Have you (Andy) considered to use FastCGI? It's still a regular CGI, just that it's a long-running process instead of a one-shot process. So the overheads will be amortized over several requests.
Yep. Eventually FastCGI will be the way to go for the CGI implementation. In the current case, one target platform is a small embedded webserver that isn't FastCGI enabled, but I may be able to upgrade to Lighty or something in the future. However, I would also like to use the same library in other processes, to access the same data, and these processes are short lived similar to old-school CGI. So, my hope is to make a good-enough implementation for the one-shot scenario, and then use something like FastCGI where that is possible.
On Sun, Dec 28, 2008 at 12:57:13PM -0600, Andy Wiese wrote:
So iiuc, on my two target platforms at least, there is no fundamental difference between managed_mapped_file and managed_shared_memory. I should not expect to see any fundamental performance difference between them, and the msync call in question is probably correct. Someone please correct me if I'm mistaken.
Yes, calling msync() on file-backed storage is correct.
My previous experience with shared memory IPC has been with shmget and its family. If those area also implemented as files, it has never mattered to me and I haven't noticed.
They are not. SYSV shared memory segments are kernel objects. They _might_ be implemented via a special filesystem (UNIX likes to map internally memory pages to "vnodes"), but its operations are not forwarded to disk. In the old days, SYSV SHM was not even pageable (though, this has changed now).
So, my hope is to make a good-enough implementation for the one-shot scenario, and then use something like FastCGI where that is possible.
My advice is to find out how to persuade the interprocess library to use SYSV SHM, if at all possible.
On Sunday 28 December 2008, Andy Wiese wrote:
Yep. Eventually FastCGI will be the way to go for the CGI implementation. In the current case, one target platform is a small embedded webserver that isn't FastCGI enabled, but I may be able to upgrade to Lighty or something in the future. However, I would also like to use the same library in other processes, to access the same data, and these processes are short lived similar to old-school CGI. So, my hope is to make a good-enough implementation for the one-shot scenario, and then use something like FastCGI where that is possible.
Did you consider creating a server process that holds the indexes and just open a pipe or socket to the server from your cgi processes to execute the query? Lothar -- Lothar Werzinger Dipl.-Ing. Univ. framework & platform architect Tradescape Inc. - Enabling Efficient Digital Marketplaces 1754 Technology Drive, Suite 128 San Jose, CA 95110 web: http://www.tradescape.biz
participants (4)
-
Andy Wiese
-
Ion Gaztañaga
-
Lothar Werzinger
-
Zeljko Vrba