
The one killer limitation of shmem (that I'm pretty sure Ion is working hard to remove) is that the shared memory region cannot be grown once it has been created. This is where your memory-mapped "persist" library has a leg up.
The problem is quite hard to solve if you allow shared memory to be placed in different base addresses in different processes. And performance would suffer if every pointer access I should check if the memory segment it points is already mapped. To identify each segment, a pointer should have the name of the segment and an offset. Each access would imply discovering the real address of such segment in the current process accessing the pointer. Really a hard task to do and performance would suffer a lot. I think previous efforts ("A C++ Pooled, Shared Memory Allocator For The Standard Template Library" http://allocator.sourceforge.net/) with growing shared memory use fixed memory mappings in different processes. But this is an issue I would like to solve after the first version of Shmem is presented to a review (I plan to do this shortly, within two months)
Memory mapped files are another thing. Disk blocks can be dispersed in the disk but the OS will give you the illusion that all data is contiguous. Currently in Shmem, when using memory mapped files as memory backend, if your memory mapped file is full of data, you can grow the memory mapped file and remap it, so you have more data to work. An in-memory DB can be easily implemented using this technique: when the insertion in any object allocated in the memory mapped file throws boost::shmem::bad_alloc, you just call:
named_mfile_object->grow(1000000/*additional bytes*/);
and the file grows and you can continue allocating objects.
Couldn't the allocator do this instead of asking the user to do it? It would be better if the container did not need special code for different allocators.
Take care because the OS might have changed the mapping address. In Shmem you can obtain offsets to objects to recover the new address of the remapped object. You can use the same technique with heap memory. The trick in Shmem is that to achieve maximum performance, the memory space must be contiguous. For growing memory, and persistent data, memory mapped files are available in Shmem. Maybe is not enough for a relational DB, but I would be happy to work with RTL library on this.
I've downloaded RML and I've seen that "mt_tree" class uses raw pointers in the red-black tree algorithm. If you use memory mapped files and you store raw pointer there, this file is unusable if you don't map it again exactly in the same address where you created it. All data in the memory mapped file must be base-address independent. That's why Shmem uses offset_ptr-s and containers that accept this kind of pointers. So if we want to achieve persistence with RTL we must develop base independent containers. This is not a hard task but porting, for example, multiindex to offset_ptr-s, is not a one day issue.
If you can make the assumption that memory will not move, it would make the implementation a lot simpler. There is a certain overhead in offsetting pointers on each pointer dereference, and the red-black tree algorithms are quite pointer intensive. Mt_tree could certainly use the pointer type from the allocator, and I'll put that into my next release of RML. Persist's approach uses a pool of mapped memory - thereby avoiding needing to move memory. [To people unfamiliar with mmap(): a file does not have to be mapped contiguously into the address space]. Allocating more memory means mapping another block, and no memory needs to be moved. Although I haven't seen it in practice, it is certainly a theoretical possibility that the OS will refuse to map the file back to the same memory addresses the next time the program is run, and this is the one reason why I haven't been pushing the Persist library because I just can't guarantee its safety. My feeling is that if the address space was large enough (i.e. 64-bit) and the OS could guarantee to map to a specific address, then the offset_ptr workaround would become unnecessary. The other problem is that other threads won't be expecting objects to move. This means that you can't have concurrent access to your memory-mapped data. Also if the file is shared between processes and you grow the file in one process, when does another process detect the change? My feeling is that safety is paramount, and that it is better to have a safe slower implementation using offset_ptrs, than to use absolute memory addresses and risk mmap() failure. Alternatively the application could be robust to mmap() failure, for example if the memory-mapped data could be reconstructed from another data source. You could perhaps provide two allocators in Shmem: one that uses offset_ptrs and another that does not. Regards, Calum