Re: [boost] [stldb] design details (was Boost library submission (poll for interest))

On Tue, 5 Jan 2010 16:54:52 +0100 Stefan Strasser <strasser@uni-bremen.de> wrote:
I thought the reason that requires you to load the entire dataset on startup and save it entirely at checkpoints and recoveries is that the user needs to access the mapped region directly through pointers, since the user types exist in an unserialized form there. is that right?
IIUC by load you mean reconstruct the memory mapped file from the checkpoints + logs? The only time that I have to reconstruct the memory mapped file or shared memory region using the checkpoints and logs is when recovery is done, NOT during routine/controlled startup and shutdown. See the architecture page: http://stldb.sourceforge.net/stldb/arch.html. I specifically wrote this page after our prior discussions. If processes disconnect normally, and recovery is not needed then I reuse the region. When recovery is required, I do have the problem of having to reconstruct the memory mapped file using the whole of the last checkpoint. Thus I agree your shadow paging of the container in the memory map would reduce recovery time. Note that I am changing my checkpoint process so that each subsequent checkpoint will only write out changes since the last checkpoint, and do so by reusing free portions of the checkpoint file. Consequently the I/O directed to both the log and checkpoints is in proportion to the rate of change, and not to the size of the database. STLdb was originally designed as a memory resident database. It should do that well since it provides direct access to containers in a shared region. My thinking was that if the user elects to use a memory mapped file for the region, they then have a paged database. The I/O directed at checkpoints and logs is based entirely on rate of change. However, that paged database then introduces redundant disk I/O since it will page out per LRU while I also occasionally write changed entries out to the checkpoints as checkpoints are done. IIUC you have a similar problem but it involves redundancy of memory. If you work with a large intrusive container, then you have the serialized storage memory mapped in, but must also cache the deserialized nodes of the container in memory to ensure fast locator traversal, and allow it to work like an index. What I am currently very intriguied by is your idea of using STLdb as indexes for a large database. i.e. The containers are map<key,locator>, and refer to a separate Boost.Persistence stored objects. Ideally, the containers will be small and tend to be held in memory (due to frequent use) so the double-write problem goes away since the OS won't be paging the memory map. A user could even mandate that the indexes stay memory resident by using a shared memory region type, rather than a memory map. There's another benefit to this - it would allow heterogenous containers, to the theme of map<key,base*> because of the locator's use of boost serialization. And also because the real value-type is not in shared memory, they can once again have virtual methods, etc.
what confuses me is the "Updating Entries" section and the fact that you do track changes made by the user, e.g. via trans_map::update. so if the library is aware of every change to the mapped region, what stops you from employing a shadow paging technique and only writing to the mapped region on transaction commit, when the modified pages are logged?
that would make your library MUCH more widely usable.
Right now, the interaction between the containers and the rest of the transactional infrastructure is based off an approach inspired by the stasis library (http://tardis.cs.berkeley.edu/stasis/developers/html/main.html). There is a concept of a transactionally aware object. As modifiers on transactionally aware objects are called, those objects post 'TransactionalOperations' onto the transaction passed to them. These ops represent the means by which the transaction can ultimately commit or rollback that operation when the time comes. Containers like trans_map are transactionally aware and define their own atomic operations (i.e. in the case of trans_map, insert(), lock(), update(), erase(), clear() and swap().) When these operations are logged, it is the logical operation (insert<5,"foobar">) which is logged, not the changes done to memory, so that during recovery it can be replayed. I am not logging memory changes. A big contrast between this STLdb library and Boost.Persistence is that in your case, IIUC you are tracking the use and modification of objects but the objects themselves are unaware. One downside of this is that the objects can't collaborate to support piecemeal updates of small portion of the objects - it's always the full object which is re-serialized. Because I was starting with the notion of persisting containers, I sought to devise a model for a transaction-aware object which supported updates against portion of the object. Thus I came up with the idea of the transactional objects supporting one or more transactional operations. The tracking of changes is based on the use of these transactional operations. The only reason I went with methods like lock() and update() is that while I could have done this implicitly via non-const access to iterators (for update anyway), I thought that implicit approach would be an easy source of bugs. i.e. My 1M finds() become 1M find() and updates() because I forgot to make the getName() method on my object const. Are you requiring const correctness when accessing locators? I really like your interface in Boost.Persistent, but are you currently assuming an update when a locator is accessed non-const, or is it more clever than that, i.e. making a transaction-local copy during non-const access, but later checking for differences between the transaction-local copy and original before reserializing and storing the object?
because on the other hand, dereferencing trans_map::iterator returns a non-const reference to value_type, indicating the opposite.
Bug. I originally had the iterator always returning a const reference, to be consistent with the demand that users use update(). However, had some kind of difficulty, and never circled back to it (and can't recall why currently). Will be addressed in the final submission. Also, keep in mind that the code I have currently has not been submitted to boost. Its ugly. The interfaces need work, but what currently exists does run. I am not proud of some of what's in there currently.
what kind of tree does trans_map use internally?
Red/Black per boost::interprocess::map, which is used (intact) internally for the low-level map implementation. I'm wrapping that with code to make it transactionally aware. I didn't want yet another implementation of map.
I can't wrap my head around the locking or MVCC used for isolation. could you explain this a little more in detail?
does the library automatically record/lock accesses? is the user supposed to lock manually on every access?
The isolation behavior is based on RDBMS conventions for row-level locking. Reads do not lock entries, writes do lock entries. These entry-level locks last until commit/rollback. A row can be locked directly (for pessimistic locking convention) by using the lock() method on the entry. This was needed because I do have blocking capabilities on modifier methods, and thus needed a way to keep one thread's blocking update from overwriting another. See the section in the "quick start" which shows lock() being used. It corresponds to the RDBMS concept of "select for update" and is used for the same reasons. I have been thinking about adding optimistic locking support based on specialization of the map for a value type implementing some optimistically lockable interface. Doing so would then allow you to avoid any call to lock(), but present the chance of failure during update(). This is all per RDBMS row-update conventions, and ultimately, I planned to support both just as many RDBMS support both, giving the user the choice. Incidentally, one problem I have with Boost.Persistence is that you always handle contention by throwing. IIUC contention can occur when two threads insert records with different keys into a container simply because they would both add their node under the same parent, and despite their nodes having different keys. So in other words, there's nothing apps can do to prevent the possibility of contention errors. This can backfire badly if the probability of contention is high because of app design. For example, someone writes a transaction which does several things, one of which is to push_back something to a queue. The transactions which did the push_back would need to be completely synchronized, IIUC. Support for blocking was one of the RDBSM worlds answer to this kind of problem, and the reason I included support for it in the trans_map container design. Thus I'm drawing the conclusion that implementing containers base on the locator concept's handling of concurrency might not always be the right idea. btw... I'm still not settled on the interface for lock() / update() and have wondered if I should somehow put that onto the iterator. E.g. lock() would be a method of the iterator, and update() could be a method on the iterator which returns a non-const L-value reference that can then be updated directly. Suggestions welcome...
how are possible read-accesses over an entire range stored, e.g. by using map::equal_range()?
If you hold a shared lock on the map while accessing it via a generic algorithm you are guaranteed to see the data reflecting a consistent set of transactions. In between locks, the result of new commits can appear, leading to the various phantom scenarios defined in the ANSI isolation levels. Holding a shared lock on the map prevents other transactions from committing changes to it while the lock is in effect. You aren't meant to hold locks on the containers for a very long time if you want good concurrency, but it can be done.

thanks for the detailed response. Am Thursday 07 January 2010 18:36:11 schrieb Bob Walters:
A big contrast between this STLdb library and Boost.Persistence is that in your case, IIUC you are tracking the use and modification of objects but the objects themselves are unaware. One downside of this is that the objects can't collaborate to support piecemeal updates of small portion of the objects - it's always the full object which is re-serialized.
right. that's why we need both libraries.
On Tue, 5 Jan 2010 16:54:52 +0100 Stefan Strasser <strasser@uni-bremen.de> wrote:
I thought the reason that requires you to load the entire dataset on startup and save it entirely at checkpoints and recoveries is that the user needs to access the mapped region directly through pointers, since the user types exist in an unserialized form there. is that right?
IIUC by load you mean reconstruct the memory mapped file from the checkpoints + logs?
that's what I meant above, yes, but what I really mean is any kind of operation that is linear to the database size instead of the working set. I thought this was necessary at startup (of the process that creates the shared memory segment) by design. if I was wrong here, good. but I still think that triggering such an operation by anything that happens during the normal use of a database is near-catastrophic. that imho includes a crashing application.
IIUC you have a similar problem but it involves redundancy of memory. If you work with a large intrusive container, then you have the serialized storage memory mapped in, but must also cache the deserialized nodes of the container in memory to ensure fast locator traversal, and allow it to work like an index.
yes
What I am currently very intriguied by is your idea of using STLdb as indexes for a large database. i.e. The containers are map<key,locator>, and refer to a separate Boost.Persistence stored objects.
me too. if we reach that point, automatic global indexes like you'd expect from a relational database could be implemented pretty easily: struct pers_type{ string name; template<class Archive> void serialize(Archive &ar,unsigned int){ ar & persistent::index<name_tag>(name); } }; voila, you can look up pers_type's by name. there are two elements to store a locator: - the object id, which is specified to be POD. - a tag representing the resource the object is stored in. in the case of storing a locator in a map, the tag would probably be stored as an index into a list of resources to make the map element fixed size.
correctness when accessing locators? I really like your interface in Boost.Persistent, but are you currently assuming an update when a locator is accessed non-const, or is it more clever than that, i.e. making a transaction-local copy during non-const access, but later checking for differences between the transaction-local copy and original before reserializing and storing the object?
yes, dereferencing a non-const locator causes the resource to compare the object on commit to determine if it has changed, either by calling a user-supplied function or by serializing both the original and the clone and comparing the serialized streams for equality. not doing that would seriously harm concurrency in a lot of cases (besides making unnecessary updates). Locator::write() can be used to indicate an object will definitely be modified and avoid the above for performance. I have to add a chapter about dereferencing, read() and write() and their performance implications to the documentation.
what kind of tree does trans_map use internally?
Red/Black per boost::interprocess::map, which is used (intact) internally for the low-level map implementation. I'm wrapping that with code to make it transactionally aware. I didn't want yet another implementation of map.
I implemented a B-tree once, I don't remember the quality of the code, and it's C++/CLI but if you want it let me know...
does the library automatically record/lock accesses? is the user supposed to lock manually on every access?
The isolation behavior is based on RDBMS conventions for row-level locking. Reads do not lock entries, writes do lock entries. These entry-level
as far as I know, MySQL InnoDB does acquire a shared lock on reads.
Incidentally, one problem I have with Boost.Persistence is that you always handle contention by throwing. IIUC contention can occur when two threads insert records with different keys into a container simply because they would both add their node under the same parent, and despite their nodes having different keys. So in other words, there's nothing apps can do to prevent the possibility of contention errors.
I have thought about implementing an object resource that uses locking instead of MVCC (this is a limitation of the current implementation only, not the design/interface) but while that would improve performance in some situations (especially when there are few conflicts) I don't understand why that would change the likelyhood of conflicts or how an application could avoid conflicts. in your example of two nodes added under the same parent node, both transactions change the parent node, so there is a conflict if you use MVCC or locking. are you talking about the cost of repeating a transaction after a MVCC conflict if that transaction is large and contains many other operations? do you see a way around that? except implementing a locking object resource, which is possible without changing any of the public interface. btw, my "Intrusive Persistent Object Containers" are not meant to be indexes. I intentionally removed the non-intrusive versions I already had implemented, because I don't want people to think these should be used e.g. as map<int,int>. if you already have a bunch of persistent objects anyway, and you want to arange them in a list, in a tree, you can use those containers to do the work for you. but they are not meant to "index" an int.
btw... I'm still not settled on the interface for lock() / update() and have wondered if I should somehow put that onto the iterator. E.g. lock() would be a method of the iterator, and update() could be a method on the iterator which returns a non-const L-value reference that can then be updated directly. Suggestions welcome...
I'm not experienced enough with the row locking of RDBMS to make any suggestions here. in general, functions modifying a container are not part of the iterator in STL, but part of the container with an iterator as argument. (there is no Iterator::erase()).
how are possible read-accesses over an entire range stored, e.g. by using map::equal_range()?
If you hold a shared lock on the map while accessing it via a generic algorithm you are guaranteed to see the data reflecting a consistent set of transactions. In between locks, the result of new commits can appear, leading to the various phantom scenarios defined in the ANSI isolation levels.
how do you make sure the following transaction fails? (assume an isolation level here that does not allow the transaction to succeed. even if your current container doesn't support full isolation, it should be possible to implement one, with the same interface, right?) map<int,int> numbers; map<int,int> numbersX2; numbersX2[x] is guaranteed to be numbers[x] * 2, whenever a transaction changes one index, it also changes the other. now a transaction A comes along that reads an element of "numbers", acts on it, then transaction B updates both containers, then transaction A reads a the same element of numbersX2 and acts on it. the state presented to transaction A was inconsistent. how is transaction A prevented from committing? do I need to know that transaction A accesses both containers when I implement transaction A and lock both containers beforehand?

Hi Bob, Bob Walters wrote:
Right now, the interaction between the containers and the rest of the transactional infrastructure is based off an approach inspired by the stasis library (http://tardis.cs.berkeley.edu/stasis/developers/html/main.html). There is a concept of a transactionally aware object. As modifiers on transactionally aware objects are called, those objects post 'TransactionalOperations' onto the transaction passed to them. These ops represent the means by which the transaction can ultimately commit or rollback that operation when the time comes.
Can you comment on the performance of this compared to a regular std::map<> or other data structure? One potential benefit of memory mapping - compared to e.g. an SQL database connection - is that you should be able to get performance very close to regular "in-application" data access. Regards, Phil.
participants (3)
-
Bob Walters
-
Phil Endecott
-
Stefan Strasser