
On Fri, Jan 8, 2010 at 10:39 AM, Rutger ter Borg <rutger@terborg.net> wrote:
Very interesting. I've written a std::map interface to Berkeley DB, which gives quite some of functionality. Taken from that, I have a couple of questions,
1) Transactional semantics: wouldn't it be easier to steal semantics from locks in threads? E.g., for the synchronous interface case, wouldn't
map_type m_map; try { scoped_transaction trans( m_map ); .. .. do stuff with the map .. trans.commit(); } catch( transaction_error ) { }
be a easier than passing the transaction everywhere?
Yes. I plan to add support for this, but I also think I'm going to keep the explicit model for two reasons: 1) the odd chance that a particular thread wants to work with multiple outstanding transactions, and 2) makes it easy for me to write test cases for the many integrity-related tests that I need. i.e. I can write a test which: transaction txn1( db ); transaction txn2( db ); map->insert( entry, txn1 ); assert( map->find( entry, txn1 ) != map->end() ); assert( map->find( entry, txn2 ) == map->end() );
2) what serialization models are you considering? I.e., for a map of int to doubles, serialization would be overkill, wouldn't it?
Yes. I've thought about having specializations to support different serialization models. So that a complex type might use Boost.Serialization, but concrete types can use direct byte copies, etc. This isn't done yet. Currenty, I'm using map<string,string> in actual applications, so haven't put any work into alternatives.
3) have you considered things like key prefix-compression and storing keys and values in different files?
I have considered memory localization controls to cluster map nodes and keys into certain segments of a region, and leave values to use all other memory in the region. The point of this being for memory maps that can't be held memory resident, to establish hot spots which can stay paged in thus assuring only one page in when doing find() and value access. But nothing more.
4) how did you solve "map[ key ] = value" vs something = map[ key ]? Here, I resorted to a reference object that would do the .put() in case of assignment, a .get() in case of an implicit conversion.
Actually, I haven't implemented that method yet. Need an implicit transaction passing technique before it can be implemented. Approach will probably be to require an scoped lock on the map for the duration of that call.
5) do you reach 200,000 transactions per second per thread? :-)
I'm assuming that you realize that the answer to this would depend on the transaction composition, the speed of the machine, the number of values in the map during the test. ;) What I can say is that I will be running a comparative test between the maps in this database and an equivalent, multi-threaded, heap-based use of a std::map in the same way. All disk I/O in STLdb can be suppressed for the purpose of such testing, allowing an apples-to-apples comparison that should show how much overhead I am adding with the transactional infrastructure, and how I may be negatively affecting concurrency. The apps I use to do this will be checked in with the project, to support repeatability.