Re: [boost] [relational] Relational Model Library

2 Oct 2005

...
...
The one killer limitation of shmem (that I'm pretty sure Ion is 
working hard to remove) is that the shared memory region cannot be 
grown once it has been created. This is where your memory-mapped 
"persist" library has a leg up.
The problem is quite hard to solve if you allow shared memory to be 
placed in different base addresses in different processes. And 
performance would suffer if every pointer access I should 
check if the 
memory segment it points is already mapped. To identify each 
segment, a 
pointer should have the name of the segment and an offset. 
Each access 
would imply discovering the real address of such segment in 
the current 
process accessing the pointer. Really a hard task to do and 
performance 
would suffer a lot. I think previous efforts ("A C++ Pooled, Shared 
Memory Allocator For The Standard Template Library" 
http://allocator.sourceforge.net/) with growing shared memory 
use fixed 
memory mappings in different processes. But this is an issue I would 
like to solve after the first version of Shmem is presented 
to a review 
(I plan to do this shortly, within two months)
Memory mapped files are another thing. Disk blocks can be 
dispersed in 
the disk but the OS will give you the illusion that all data is 
contiguous. Currently in Shmem, when using memory mapped 
files as memory 
backend, if your memory mapped file is full of data, you can grow the 
memory mapped file and remap it, so you have more data to work. An 
in-memory DB can be easily implemented using this technique: when the 
insertion in any object allocated in the memory mapped file throws 
boost::shmem::bad_alloc, you just call:
named_mfile_object->grow(1000000/*additional bytes*/);
and the file grows and you can continue allocating objects.
Couldn't the allocator do this instead of asking the user to do it?  It
would be better if the container did not need special code for different
allocators.
...
Take care 
because the OS might have changed the mapping address. In 
Shmem you can 
obtain offsets to objects to recover the new address of the remapped 
object. You can use the same technique with heap memory. The trick in 
Shmem is that to achieve maximum performance, the memory 
space must be 
contiguous. For growing memory, and persistent data, memory 
mapped files 
are available in Shmem. Maybe is not enough for a relational 
DB, but I 
would be happy to work with RTL library on this.
I've downloaded RML and I've seen that "mt_tree" class uses 
raw pointers 
in the red-black tree algorithm. If you use memory mapped 
files and you 
store raw pointer there, this file is unusable if you don't 
map it again 
exactly in the same address where you created it. All data in 
the memory 
mapped file must be base-address independent. That's why Shmem uses 
offset_ptr-s and containers that accept this kind of pointers.
So if we want to achieve persistence with RTL we must develop base 
independent containers. This is not a hard task but porting, for 
example, multiindex to offset_ptr-s, is not a one day issue.
If you can make the assumption that memory will not move, it would make
the implementation a lot simpler.  There is a certain overhead in
offsetting pointers on each pointer dereference, and the red-black tree
algorithms are quite pointer intensive.  Mt_tree could certainly use the
pointer type from the allocator, and I'll put that into my next release
of RML.

Persist's approach uses a pool of mapped memory - thereby avoiding
needing to move memory.  [To people unfamiliar with mmap(): a file does
not have to be mapped contiguously into the address space].  Allocating
more memory means mapping another block, and no memory needs to be
moved.  Although I haven't seen it in practice, it is certainly a
theoretical possibility that the OS will refuse to map the file back to
the same memory addresses the next time the program is run, and this is
the one reason why I haven't been pushing the Persist library because I
just can't guarantee its safety.  My feeling is that if the address
space was large enough (i.e. 64-bit) and the OS could guarantee to map
to a specific address, then the offset_ptr workaround would become
unnecessary.

The other problem is that other threads won't be expecting objects to
move.  This means that you can't have concurrent access to your
memory-mapped data.  Also if the file is shared between processes and
you grow the file in one process, when does another process detect the
change?

My feeling is that safety is paramount, and that it is better to have a
safe slower implementation using offset_ptrs, than to use absolute
memory addresses and risk mmap() failure.  Alternatively the application
could be robust to mmap() failure, for example if the memory-mapped data
could be reconstructed from another data source.  

You could perhaps provide two allocators in Shmem: one that uses
offset_ptrs and another that does not.

Regards, Calum

Re: [boost] [relational] Relational Model Library

Calum Grant