
Hi Stefan, On Fri, Dec 11, 2009 at 1:07 AM, Stefan Strasser <strasser@uni-bremen.de> wrote:
this is not a boost question, but it is for the development of a boost library, so maybe a database expert on the list could help me out?
I'm not a database expert but I have dealt with databases for the better part of my development career so let me share a little bit of the insights I've learned with MySQL and InnoDB.
I've been benchmarking some test cases of my library against MySQL InnoDB, which isn't even apples to oranges anymore, but nevertheless gave some insights.
what I find surprising is the performance of MySQL for small transactions. while my library spends 90% of its time waiting for the disk to flush a transaction, InnoDB uses 100% CPU and barely uses any disk.
Am I reading this correctly, you're doing something with persistence of tuples to disk, allowing indexed access to this persisted data, and then allowing someone to embed it in their application. This sounds a lot like an embedded database system and looking at SQLite and BDB might be better than comparing with InnoDB. But then that's just my take on it. ;)
I've reduced the calls to fsync() for small transactions to 1 call per transaction after some sequential writing, so I'm wondering how MySQL accomplishes this? by definition there must be a flush per transaction, unless you loosen the requirements of the D in ACID.
Is that what InnoDB is doing? the only way I see to reduce flushes even more is combining some transactions. but after a power failure some of those transactions can be lost even though they haven reported as successfully committed.
This depends completely on how you've configured InnoDB. There's a configuration setting that allows you to commit every after transaction, a setting that allows you to flush every second instead of after every commit, and a setting that allows you to flush the log file entry every after commit but the data to the disk. The way InnoDB does it is it keeps the index and the data in the memory pool -- the CPU usage is probably due to cache misses when you're doing a full-table scan. When you commit a transaction, the B-Tree index in memory is updated and a log entry is added to the log. When the flush happens is completely configurable -- so right after the B-Tree index gets updated, you can have it flushed to disk, or after a timeout, or have only the log committed and have the index+data flushed later on. Another means that they do away with the flush is by using memory mapped regions in Linux/UNIX, and then delegate the flushing to the operating system. InnoDB also supports raw storage management in which case it bypasses the normal filesystem flushing and deals with raw storage devices as block devices on its own, using more low-level access to the device.
I have not yet examined their source code yet. the MySQL documentation doesn't say anything about loosening Durability. (although you have to look pretty hard to see that the default isolation level isn't Serializable either.)
There's a section about InnoDB log file flushing. I would have included a link but I think the information above should already be helpful to you.
thanks in advance,
HTH BTW, I am interested in what you're doing and maybe using a library that tackles the embedded database space in a modern C++ manner. Looking forward to hearing more about what you're working on. -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com