
On Sat, Jan 23, 2010 at 10:58 PM, <strasser@uni-bremen.de> wrote:
Zitat von Bob Walters <bob.s.walters@gmail.com>:
Stefan, IIUC, you are essentially proposing we implement resource managers per the XA standard, correct? (http://www.opengroup.org/onlinepubs/009680699/toc.pdf) I'm basing this on your previous reference to BerkeleyDB's XA APIs.
How far do you intend to go with ensuring the recoverability of these multi-resource transactions? Specifically, The XA standard requires that RMs persist the state of the transaction during prepare() so that the transaction can still be completed or rolled back, consistently across all RMs even if the two-phase commit is interrupted by process failures.
I haven't read much of the XA standard, but I guess it's pretty much the same thing.
you also use write-ahead redo-logging, don't you?
Yes.
so what the RMs need to do is split up the commit in 2 phases with a prepare message written to the log at the end of the prepare phase, and change the recovery process a bit.
Typically, this results in a lot of sync()s, but I'm not sure how to avoid that. e.g. prepare() has to sync because the resource manager is not allowed to forget the transaction due to machine death. commit() presumably must sync again. The TM does its own logging (or perhaps shares a resource manager) to keep track of transaction progress. Once a two-phase commit protocol comes into play, things slow down. One result of this is that we should be sure that the TM is optimized for one-phase commits when only one RM is in play.
right now when the log is read a transaction is replayed based on if there's a commit message or not. when there is a prepare message (but no commit or rollback) the RM needs to report that to the TM through a function like recover() and then the TM decides if that transaction ought to be replayed or rolled back.
This makes sense. There is more to the RM API than just prepare(), rollback() and commit(). We will probably end up needing recover(), forget(), etc. per the XA api, when this is all worked out. I wonder if either of you have also considered the notion of sharing a transactional disk infrastructure (e.g. write-ahead log and checkpointing mechanism) so that it would actually be possible for the 3 different libraries to share a transactional context and result in a single write-ahead log record per transaction. i.e. One log overall, instead of one log per RM. Specifically: prepare() might have each RM create a buffer containing the data that they need in order to recover that RMs changes for the transaction. The sum of this data for each RMs is written to the write-ahead log for the transaction. As a result, there is no need for a two-phase commit. Instead, during recovery processing, each RM is told to recover based on their particular portion of the log record contents. Checkpoints would likewise need to be coordinated in some fashion like this. The reason I mention this is because I think that the write-ahead logging and checkpointing being done by stdb and Boost.Persistence do sound comparable at this point, so I'm starting to think that looking at the problem from the vantage point of shared logging might prove optimal. Is this alternative worth looking at? Vicente, I don't know enough about your libraries design to tell whether this is feasible, and so must ask... - Bob