
On Tue, 5 Jan 2010 16:54:52 +0100 Stefan Strasser <strasser@uni-bremen.de> wrote:
I thought the reason that requires you to load the entire dataset on startup and save it entirely at checkpoints and recoveries is that the user needs to access the mapped region directly through pointers, since the user types exist in an unserialized form there. is that right?
IIUC by load you mean reconstruct the memory mapped file from the checkpoints + logs? The only time that I have to reconstruct the memory mapped file or shared memory region using the checkpoints and logs is when recovery is done, NOT during routine/controlled startup and shutdown. See the architecture page: http://stldb.sourceforge.net/stldb/arch.html. I specifically wrote this page after our prior discussions. If processes disconnect normally, and recovery is not needed then I reuse the region. When recovery is required, I do have the problem of having to reconstruct the memory mapped file using the whole of the last checkpoint. Thus I agree your shadow paging of the container in the memory map would reduce recovery time. Note that I am changing my checkpoint process so that each subsequent checkpoint will only write out changes since the last checkpoint, and do so by reusing free portions of the checkpoint file. Consequently the I/O directed to both the log and checkpoints is in proportion to the rate of change, and not to the size of the database. STLdb was originally designed as a memory resident database. It should do that well since it provides direct access to containers in a shared region. My thinking was that if the user elects to use a memory mapped file for the region, they then have a paged database. The I/O directed at checkpoints and logs is based entirely on rate of change. However, that paged database then introduces redundant disk I/O since it will page out per LRU while I also occasionally write changed entries out to the checkpoints as checkpoints are done. IIUC you have a similar problem but it involves redundancy of memory. If you work with a large intrusive container, then you have the serialized storage memory mapped in, but must also cache the deserialized nodes of the container in memory to ensure fast locator traversal, and allow it to work like an index. What I am currently very intriguied by is your idea of using STLdb as indexes for a large database. i.e. The containers are map<key,locator>, and refer to a separate Boost.Persistence stored objects. Ideally, the containers will be small and tend to be held in memory (due to frequent use) so the double-write problem goes away since the OS won't be paging the memory map. A user could even mandate that the indexes stay memory resident by using a shared memory region type, rather than a memory map. There's another benefit to this - it would allow heterogenous containers, to the theme of map<key,base*> because of the locator's use of boost serialization. And also because the real value-type is not in shared memory, they can once again have virtual methods, etc.
what confuses me is the "Updating Entries" section and the fact that you do track changes made by the user, e.g. via trans_map::update. so if the library is aware of every change to the mapped region, what stops you from employing a shadow paging technique and only writing to the mapped region on transaction commit, when the modified pages are logged?
that would make your library MUCH more widely usable.
Right now, the interaction between the containers and the rest of the transactional infrastructure is based off an approach inspired by the stasis library (http://tardis.cs.berkeley.edu/stasis/developers/html/main.html). There is a concept of a transactionally aware object. As modifiers on transactionally aware objects are called, those objects post 'TransactionalOperations' onto the transaction passed to them. These ops represent the means by which the transaction can ultimately commit or rollback that operation when the time comes. Containers like trans_map are transactionally aware and define their own atomic operations (i.e. in the case of trans_map, insert(), lock(), update(), erase(), clear() and swap().) When these operations are logged, it is the logical operation (insert<5,"foobar">) which is logged, not the changes done to memory, so that during recovery it can be replayed. I am not logging memory changes. A big contrast between this STLdb library and Boost.Persistence is that in your case, IIUC you are tracking the use and modification of objects but the objects themselves are unaware. One downside of this is that the objects can't collaborate to support piecemeal updates of small portion of the objects - it's always the full object which is re-serialized. Because I was starting with the notion of persisting containers, I sought to devise a model for a transaction-aware object which supported updates against portion of the object. Thus I came up with the idea of the transactional objects supporting one or more transactional operations. The tracking of changes is based on the use of these transactional operations. The only reason I went with methods like lock() and update() is that while I could have done this implicitly via non-const access to iterators (for update anyway), I thought that implicit approach would be an easy source of bugs. i.e. My 1M finds() become 1M find() and updates() because I forgot to make the getName() method on my object const. Are you requiring const correctness when accessing locators? I really like your interface in Boost.Persistent, but are you currently assuming an update when a locator is accessed non-const, or is it more clever than that, i.e. making a transaction-local copy during non-const access, but later checking for differences between the transaction-local copy and original before reserializing and storing the object?
because on the other hand, dereferencing trans_map::iterator returns a non-const reference to value_type, indicating the opposite.
Bug. I originally had the iterator always returning a const reference, to be consistent with the demand that users use update(). However, had some kind of difficulty, and never circled back to it (and can't recall why currently). Will be addressed in the final submission. Also, keep in mind that the code I have currently has not been submitted to boost. Its ugly. The interfaces need work, but what currently exists does run. I am not proud of some of what's in there currently.
what kind of tree does trans_map use internally?
Red/Black per boost::interprocess::map, which is used (intact) internally for the low-level map implementation. I'm wrapping that with code to make it transactionally aware. I didn't want yet another implementation of map.
I can't wrap my head around the locking or MVCC used for isolation. could you explain this a little more in detail?
does the library automatically record/lock accesses? is the user supposed to lock manually on every access?
The isolation behavior is based on RDBMS conventions for row-level locking. Reads do not lock entries, writes do lock entries. These entry-level locks last until commit/rollback. A row can be locked directly (for pessimistic locking convention) by using the lock() method on the entry. This was needed because I do have blocking capabilities on modifier methods, and thus needed a way to keep one thread's blocking update from overwriting another. See the section in the "quick start" which shows lock() being used. It corresponds to the RDBMS concept of "select for update" and is used for the same reasons. I have been thinking about adding optimistic locking support based on specialization of the map for a value type implementing some optimistically lockable interface. Doing so would then allow you to avoid any call to lock(), but present the chance of failure during update(). This is all per RDBMS row-update conventions, and ultimately, I planned to support both just as many RDBMS support both, giving the user the choice. Incidentally, one problem I have with Boost.Persistence is that you always handle contention by throwing. IIUC contention can occur when two threads insert records with different keys into a container simply because they would both add their node under the same parent, and despite their nodes having different keys. So in other words, there's nothing apps can do to prevent the possibility of contention errors. This can backfire badly if the probability of contention is high because of app design. For example, someone writes a transaction which does several things, one of which is to push_back something to a queue. The transactions which did the push_back would need to be completely synchronized, IIUC. Support for blocking was one of the RDBSM worlds answer to this kind of problem, and the reason I included support for it in the trans_map container design. Thus I'm drawing the conclusion that implementing containers base on the locator concept's handling of concurrency might not always be the right idea. btw... I'm still not settled on the interface for lock() / update() and have wondered if I should somehow put that onto the iterator. E.g. lock() would be a method of the iterator, and update() could be a method on the iterator which returns a non-const L-value reference that can then be updated directly. Suggestions welcome...
how are possible read-accesses over an entire range stored, e.g. by using map::equal_range()?
If you hold a shared lock on the map while accessing it via a generic algorithm you are guaranteed to see the data reflecting a consistent set of transactions. In between locks, the result of new commits can appear, leading to the various phantom scenarios defined in the ANSI isolation levels. Holding a shared lock on the map prevents other transactions from committing changes to it while the lock is in effect. You aren't meant to hold locks on the containers for a very long time if you want good concurrency, but it can be done.