Boost library submission (poll for interest)

newer
[graph][release][1.42] Debug code...

older
[uuid][release] - release summary...

Bob Walters

4 Jan 2010 4 Jan '10

10:56 p.m.

I have a library I would like to submit for inclusion in Boost. This message is just soliciting for interest per the submission process. If interest is expressed, I'll carry on with a preliminary submission. The library is an embedded database, to the tune of products like the embedded version of InnoDB, BerkeleyDB, etc. In this case, the database is structured as a Boost.interprocess shared region (typically mmap so it can page) which contains one or more STL-like containers holding application defined data types in directly usable form (i.e. not serialized.) The database infrastructure is fully ACID compliant, with durability via write-ahead logging, and periodic checkpoints. More details, examples, and documentation is available at http://stldb.sourceforge.net/ for more details on this libraries capabilities. Currently, I've only implemented map<> (probably the most useful container type in a database), but the plan is to continue to add containers as requested/needed until ACID compliant versions of a decent portion of the STL are available. There are some concepts which I've been trying to flesh out with this, including a standard concept for a transaction-aware type which in turn exposes operations which can be logged, undone, recovered, etc. Allowing that concept to be used recursively (containers of other containers, or of application-defined transaction-aware objects) is one of the longer-term goals of this library. Let me know if there's any interest, and I'll work on a preliminary submission for the sandbox. Thanks, Bob Walters

Show replies by date

Ion Gaztañaga

5 Jan 5 Jan

1:53 p.m.

El 04/01/2010 23:56, Bob Walters escribió:

...

I have a library I would like to submit for inclusion in Boost. This message is just soliciting for interest per the submission process. If interest is expressed, I'll carry on with a preliminary submission.

Interesting.

...

The library is an embedded database, to the tune of products like the embedded version of InnoDB, BerkeleyDB, etc. In this case, the database is structured as a Boost.interprocess shared region

Glad to know the library has some use ;-)

...

Currently, I've only implemented map<> (probably the most useful container type in a database), but the plan is to continue to add containers as requested/needed until ACID compliant versions of a decent portion of the STL are available.

I think most in-memory databases use T-trees as main containers. I plan to add them as Interprocess containers for late 2010, but of course, that will depend on my free time. With those containers, we could speed up searches and waste less space. Best, ion

Mathias Gaunard

2:08 p.m.

Bob Walters wrote:

...

The library is an embedded database, to the tune of products like the embedded version of InnoDB, BerkeleyDB, etc. In this case, the database is structured as a Boost.interprocess shared region (typically mmap so it can page) which contains one or more STL-like containers holding application defined data types in directly usable form (i.e. not serialized.) The database infrastructure is fully ACID compliant, with durability via write-ahead logging, and periodic checkpoints. More details, examples, and documentation is available at http://stldb.sourceforge.net/ for more details on this libraries capabilities.

This seems very interesting from that description.

...

Currently, I've only implemented map<> (probably the most useful container type in a database), but the plan is to continue to add containers as requested/needed until ACID compliant versions of a decent portion of the STL are available.

I have several times used Boost.MultiIndex as a replacement for a database. It would be great if there could be support for everything Boost.MultiIndex can do.

Stefan Strasser

3:54 p.m.

Am Monday 04 January 2010 23:56:32 schrieb Bob Walters:

...

I have a library I would like to submit for inclusion in Boost. This message is just soliciting for interest per the submission process. If interest is expressed, I'll carry on with a preliminary submission.

still very interesting ;-) however, after having a look at your documentation I'm confused again about the discussion we had off-list. I thought the reason that requires you to load the entire dataset on startup and save it entirely at checkpoints and recoveries is that the user needs to access the mapped region directly through pointers, since the user types exist in an unserialized form there. is that right? what confuses me is the "Updating Entries" section and the fact that you do track changes made by the user, e.g. via trans_map::update. so if the library is aware of every change to the mapped region, what stops you from employing a shadow paging technique and only writing to the mapped region on transaction commit, when the modified pages are logged? that would make your library MUCH more widely usable. or is my assumption that the library is aware of every change incorrect? because on the other hand, dereferencing trans_map::iterator returns a non-const reference to value_type, indicating the opposite. http://en.wikipedia.org/wiki/Shadow_paging

...

The library is an embedded database, to the tune of products like the embedded version of InnoDB, BerkeleyDB, etc. In this case, the

if the copying-entire-dataset is required after all I'd like to suggest a change of the name of the library to avoid confusion. BerkeleyDB's implementation of a STL container interface is called DbSTL and I think there are other implementations by similar names out there which generally refer to "database access with an STL interface", but IIUC that is exactly the use case your library couldn't support: use as an embedded database (with large dataset). something like InterprocessContainers comes to mind. http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_ref...

...

Currently, I've only implemented map<> (probably the most useful container type in a database),

what kind of tree does trans_map use internally?

...

but the plan is to continue to add containers as requested/needed until ACID compliant versions of a decent portion of the STL are available.

I can't wrap my head around the locking or MVCC used for isolation. could you explain this a little more in detail? does the library automatically record/lock accesses? is the user supposed to lock manually on every access? information about transaction-local changes are saved inside a map entry, right? how are possible read-accesses over an entire range stored, e.g. by using map::equal_range()? thanks,

vicente.botet

8:21 p.m.

Hi, happy to see that the database and persistent libraries using transactions proliferate on Boost. ----- Original Message ----- From: "Bob Walters" <bob.s.walters@gmail.com> To: <boost@lists.boost.org> Sent: Monday, January 04, 2010 11:56 PM Subject: [boost] Boost library submission (poll for interest)

...

I have a library I would like to submit for inclusion in Boost. This message is just soliciting for interest per the submission process. If interest is expressed, I'll carry on with a preliminary submission.

The library is an embedded database, to the tune of products like the embedded version of InnoDB, BerkeleyDB, etc. In this case, the database is structured as a Boost.interprocess shared region (typically mmap so it can page) which contains one or more STL-like containers holding application defined data types in directly usable form (i.e. not serialized.) The database infrastructure is fully ACID compliant, with durability via write-ahead logging, and periodic checkpoints. More details, examples, and documentation is available at http://stldb.sourceforge.net/ for more details on this libraries capabilities.

Let me know if there's any interest, and I'll work on a preliminary submission for the sandbox.

I'm interested on a library that has transactional shared memory, but something more general that your containers database in shared memory. Can your library be extendeed to manage with types that are not containers? If I have understood, the data is stored using Boost.Interprocess. Could you clarify why do you force the stored types to be Serializable? With your library, there are alreasy 3 libraries under construction providing a transactional service (Boost.STLdb, Boost.Persistent and TBoost.STM). The ideal would be to have a single transactional framework with several transactional resources, e.g shared memory resource (Boost.STLdb), persistent resource (Boost.Persistent ), in memory resource (Boost.STM). If I'm not wrong Boost.Persistent contains already a Ressource concept. I would like to see how this transactional framework can be made generic so the three libraries can share the same transaction. Stefan, Bob, are you interesteed in participating on such a framework? Could you give a link from where to download the code. If you want I can add your library to the Boost Libraries Under Construction page https://svn.boost.org/trac/boost/wiki/LibrariesUnderConstruction. Best, Vicente

Stefan Strasser

6 Jan 6 Jan

8:24 a.m.

Am Tuesday 05 January 2010 21:21:29 schrieb vicente.botet:

...

With your library, there are alreasy 3 libraries under construction providing a transactional service (Boost.STLdb, Boost.Persistent and TBoost.STM). The ideal would be to have a single transactional framework with several transactional resources, e.g shared memory resource (Boost.STLdb), persistent resource (Boost.Persistent ), in memory resource (Boost.STM). If I'm not wrong Boost.Persistent contains already a Ressource concept. I would like to see how this transactional framework can be made generic so the three libraries can share the same transaction. Stefan, Bob, are you interesteed in participating on such a framework?

sure, this is what I tried by defining the concepts, although I was never sure if those are sufficient for all conceivable transactional resources. https://svn.boost.org/svn/boost/sandbox/persistent/libs/persistent/doc/html/... https://svn.boost.org/svn/boost/sandbox/persistent/libs/persistent/doc/html/... what's still missing from there is an interface to conduct transactions which involve more than 1 resource, e.g. something like this in Berkeley: http://pybsddb.sourceforge.net/api_c/txn_prepare.html http://pybsddb.sourceforge.net/api_c/txn_recover.html (what they call a "local transaction manager" I call a "resource manager", and what they call a "global transaction manager" I call a "transaction manager") we'd also have to find common ground on what Bob calls I think a "row_level_lock_contention" and I call an "isolation_exception", and I'm sure there is the equivalent in Boost.STM.

Stefan Strasser

12:29 p.m.

Am Tuesday 05 January 2010 21:21:29 schrieb vicente.botet:

...

With your library, there are alreasy 3 libraries under construction providing a transactional service (Boost.STLdb, Boost.Persistent and TBoost.STM).

There is also Boost.Rdb by Jean-Louis Leroy.

Rutger ter Borg

8 Jan 8 Jan

3:39 p.m.

Bob Walters wrote:

...

I have a library I would like to submit for inclusion in Boost. This message is just soliciting for interest per the submission process. If interest is expressed, I'll carry on with a preliminary submission.

The library is an embedded database, to the tune of products like the embedded version of InnoDB, BerkeleyDB, etc. In this case, the database is structured as a Boost.interprocess shared region (typically mmap so it can page) which contains one or more STL-like containers holding application defined data types in directly usable form (i.e. not serialized.) The database infrastructure is fully ACID compliant, with durability via write-ahead logging, and periodic checkpoints. More details, examples, and documentation is available at http://stldb.sourceforge.net/ for more details on this libraries capabilities.

Currently, I've only implemented map<> (probably the most useful container type in a database), but the plan is to continue to add containers as requested/needed until ACID compliant versions of a decent portion of the STL are available.

Very interesting. I've written a std::map interface to Berkeley DB, which gives quite some of functionality. Taken from that, I have a couple of questions, 1) Transactional semantics: wouldn't it be easier to steal semantics from locks in threads? E.g., for the synchronous interface case, wouldn't map_type m_map; try { scoped_transaction trans( m_map ); .. .. do stuff with the map .. trans.commit(); } catch( transaction_error ) { } be a easier than passing the transaction everywhere? 2) what serialization models are you considering? I.e., for a map of int to doubles, serialization would be overkill, wouldn't it? 3) have you considered things like key prefix-compression and storing keys and values in different files? 4) how did you solve "map[ key ] = value" vs something = map[ key ]? Here, I resorted to a reference object that would do the .put() in case of assignment, a .get() in case of an implicit conversion. 5) do you reach 200,000 transactions per second per thread? :-) What would be really nice is something like a stored version of multi_index. Thanks, Cheers, Rutger

Bob Walters

4:14 p.m.

On Fri, Jan 8, 2010 at 10:39 AM, Rutger ter Borg <rutger@terborg.net> wrote:

...

Very interesting. I've written a std::map interface to Berkeley DB, which gives quite some of functionality. Taken from that, I have a couple of questions,

1) Transactional semantics: wouldn't it be easier to steal semantics from locks in threads? E.g., for the synchronous interface case, wouldn't

map_type m_map; try { scoped_transaction trans( m_map ); .. .. do stuff with the map .. trans.commit(); } catch( transaction_error ) { }

be a easier than passing the transaction everywhere?

Yes. I plan to add support for this, but I also think I'm going to keep the explicit model for two reasons: 1) the odd chance that a particular thread wants to work with multiple outstanding transactions, and 2) makes it easy for me to write test cases for the many integrity-related tests that I need. i.e. I can write a test which: transaction txn1( db ); transaction txn2( db ); map->insert( entry, txn1 ); assert( map->find( entry, txn1 ) != map->end() ); assert( map->find( entry, txn2 ) == map->end() );

...

2) what serialization models are you considering? I.e., for a map of int to doubles, serialization would be overkill, wouldn't it?

Yes. I've thought about having specializations to support different serialization models. So that a complex type might use Boost.Serialization, but concrete types can use direct byte copies, etc. This isn't done yet. Currenty, I'm using map<string,string> in actual applications, so haven't put any work into alternatives.

...

3) have you considered things like key prefix-compression and storing keys and values in different files?

I have considered memory localization controls to cluster map nodes and keys into certain segments of a region, and leave values to use all other memory in the region. The point of this being for memory maps that can't be held memory resident, to establish hot spots which can stay paged in thus assuring only one page in when doing find() and value access. But nothing more.

...

4) how did you solve "map[ key ] = value" vs something = map[ key ]? Here, I resorted to a reference object that would do the .put() in case of assignment, a .get() in case of an implicit conversion.

Actually, I haven't implemented that method yet. Need an implicit transaction passing technique before it can be implemented. Approach will probably be to require an scoped lock on the map for the duration of that call.

...

5) do you reach 200,000 transactions per second per thread? :-)

I'm assuming that you realize that the answer to this would depend on the transaction composition, the speed of the machine, the number of values in the map during the test. ;) What I can say is that I will be running a comparative test between the maps in this database and an equivalent, multi-threaded, heap-based use of a std::map in the same way. All disk I/O in STLdb can be suppressed for the purpose of such testing, allowing an apples-to-apples comparison that should show how much overhead I am adding with the transactional infrastructure, and how I may be negatively affecting concurrency. The apps I use to do this will be checked in with the project, to support repeatability.

Stefan Strasser

6:37 p.m.

New subject: [thread] thread_specific_ptr performance (was: Re: Boost library submission (poll for interest))

Am Friday 08 January 2010 16:39:27 schrieb Rutger ter Borg:

...

1) Transactional semantics: wouldn't it be easier to steal semantics from locks in threads? E.g., for the synchronous interface case, wouldn't

map_type m_map; try { scoped_transaction trans( m_map ); .. .. do stuff with the map .. trans.commit(); } catch( transaction_error ) { }

be a easier than passing the transaction everywhere?

implementing this requires a call to thread_specific_ptr::get() on each operation to obtain the active transaction. unfortunately thread_specific_ptr is implemented using pthread calls and a std::map lookup, so this consumes > 6% CPU in one of my test cases. and this is a real world test case with other expensive stuff, in cases that e.g. only read cached objects it's probably even worse. is there any chance for a thread_specific_ptr implementation based on GCC __thread and MSVC __declspec(thread)? __thread results in a simple read access using a thread-specific memory segment.

vicente.botet

7:36 p.m.

New subject: [thread] thread_specific_ptr performance (was: Re: Boostlibrary submission (poll for interest))

----- Original Message ----- From: "Stefan Strasser" <strasser@uni-bremen.de> To: <boost@lists.boost.org> Sent: Friday, January 08, 2010 7:37 PM Subject: [boost] [thread] thread_specific_ptr performance (was: Re: Boostlibrary submission (poll for interest))

...

Am Friday 08 January 2010 16:39:27 schrieb Rutger ter Borg:

...
1) Transactional semantics: wouldn't it be easier to steal semantics from locks in threads? E.g., for the synchronous interface case, wouldn't

map_type m_map; try { scoped_transaction trans( m_map ); .. .. do stuff with the map .. trans.commit(); } catch( transaction_error ) { }

be a easier than passing the transaction everywhere?

implementing this requires a call to thread_specific_ptr::get() on each operation to obtain the active transaction.

unfortunately thread_specific_ptr is implemented using pthread calls and a std::map lookup, so this consumes > 6% CPU in one of my test cases. and this is a real world test case with other expensive stuff, in cases that e.g. only read cached objects it's probably even worse.

is there any chance for a thread_specific_ptr implementation based on GCC __thread and MSVC __declspec(thread)?

__thread results in a simple read access using a thread-specific memory segment.

Hi, As Anthony has already state on other threads this seems to be not posible (the semantic is different). What about defining a macro BOOST_LOCAL_THREAD that do the needed portable issues? Have you make some measures with __thread and/or MSVC __declspec(thread)? BTW, Anthony has made some improvements to which thread_specific_ptr in 1.41. Which version are you using? Best, Vicente

Stefan Strasser

9 Jan 9 Jan

12:19 a.m.

New subject: [thread] thread_specific_ptr performance (was: Re: Boostlibrary submission (poll for interest))

Am Friday 08 January 2010 20:36:46 schrieb vicente.botet:

...

...
is there any chance for a thread_specific_ptr implementation based on GCC __thread and MSVC __declspec(thread)?

__thread results in a simple read access using a thread-specific memory segment.

Hi, As Anthony has already state on other threads this seems to be not posible (the semantic is different).

could you point me towards that thread? the one about dynamically loaded libraries?

...

What about defining a macro BOOST_LOCAL_THREAD that do the needed portable issues? Have you make some measures with __thread and/or MSVC __declspec(thread)?

I'm not sure if you can bring together all the different semantics of those extensions using a macro. the proposal of C++0x thread local storage is about 10 pages: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2659.htm

...

BTW, Anthony has made some improvements to which thread_specific_ptr in 1.41. Which version are you using?

1.41

vicente.botet

12:22 a.m.

New subject: [thread] thread_specific_ptr performance (was: Re:Boostlibrary submission (poll for interest))

----- Original Message ----- From: "Stefan Strasser" <strasser@uni-bremen.de> To: <boost@lists.boost.org> Sent: Saturday, January 09, 2010 1:19 AM Subject: Re: [boost] [thread] thread_specific_ptr performance (was: Re:Boostlibrary submission (poll for interest))

...

Am Friday 08 January 2010 20:36:46 schrieb vicente.botet:

...
...
is there any chance for a thread_specific_ptr implementation based on GCC __thread and MSVC __declspec(thread)?

__thread results in a simple read access using a thread-specific memory segment.

Hi, As Anthony has already state on other threads this seems to be not posible (the semantic is different).

could you point me towards that thread? the one about dynamically loaded libraries?

http://old.nabble.com/-thread--TSS-cleanup-tt19590361.html#a19617150

...

...
What about defining a macro BOOST_LOCAL_THREAD that do the needed portable issues? Have you make some measures with __thread and/or MSVC __declspec(thread)?

I'm not sure if you can bring together all the different semantics of those extensions using a macro. the proposal of C++0x thread local storage is about 10 pages: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2659.htm

Well the proposal may have 10 pages but the syntactical addition is quite simple, the problem is that each compiler has each own syntax.

...

...
BTW, Anthony has made some improvements to which thread_specific_ptr in 1.41. Which version are you using?

1.41

Have you made some mesurements comparing thread_specific_ptr and __thread for example? If there is a big difference maybe you could make a ticket then. Best, Vicente

Andrey Semashev

12:22 p.m.

New subject: [thread] thread_specific_ptr performance

On 01/09/2010 03:19 AM, Stefan Strasser wrote:

...

...
Hi, As Anthony has already state on other threads this seems to be not posible (the semantic is different).

could you point me towards that thread? the one about dynamically loaded libraries?

I don't have a discussion thread at hand, but these pages of MSDN may be of interest for you: http://tinyurl.com/ycmwwmd http://tinyurl.com/y8nfoal In short: * The TLS must have static duration. * Only POD types are supported as TLS declared with __declspec(thread). * It doesn't work if TLS is defined in a delay-loaded dll. Not sure what constraints are for __thread in GCC but it looks like it too supports only POD types with static duration. http://tinyurl.com/yae8h8c Also, AFAIU, these mechanisms are based on OS-provided APIs, which may have rather constrained TLS capacity. With Boost.Thread solution there's virtually no limit for the number of thread-specific variables you may have.

Stefan Strasser

11:23 a.m.

New subject: [thread] thread_specific_ptr performance

Am Saturday 09 January 2010 13:22:23 schrieb Andrey Semashev:

...

In short:

* The TLS must have static duration. * Only POD types are supported as TLS declared with __declspec(thread). * It doesn't work if TLS is defined in a delay-loaded dll.

I see. but while that is a good reason to have a std::map (or unordered_map) in boost, I still don't get why you can't use thread local storage for saving the pointer to the map (i.e. the result of get_current_thread_data()). I can't say I understand all the details of the current implementation, but in case pthread_setspecific etc. needs to be kept for some reason related to construction/destruction, it seems to me at the very least the result of get_current_thread_data() could be cached in a thread-local variable. that would speed up thread_specific_ptr::get() 5-fold according to my profiling. Am Saturday 09 January 2010 12:48:39 schrieb Rutger ter Borg:

...

Stefan Strasser wrote:

...
implementing this requires a call to thread_specific_ptr::get() on each operation to obtain the active transaction.

Isn't this under the assumption that such a container should be thread-safe by default?

even if you exclusively lock the container before every access, even if you stop all other threads, the container still needs to obtain the active transaction, set by the constructor of transaction_scope in your example. the only assumption is that there is more than one thread that created a transaction_scope object, but I think you have to assume that. Am Saturday 09 January 2010 01:22:25 schrieb vicente.botet:

...

...
the proposal of C++0x thread local storage is about 10 pages: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2659.htm

Well the proposal may have 10 pages but the syntactical addition is quite simple, the problem is that each compiler has each own syntax.

hiding the syntax behind a BOOST macro shouldn't be a problem, but I doubt both extensions (and intel's, and...) have the same semantics regarding construction, destruction, what types can be used etc.

...

Have you made some mesurements comparing thread_specific_ptr and __thread for example? If there is a big difference maybe you could make a ticket then.

no I have not, but there is no need to. accessing a __thread T *ptr; results in something like: movl gs:ptr,%eax

Andrey Semashev

12:58 p.m.

New subject: [thread] thread_specific_ptr performance

On 01/09/2010 02:23 PM, Stefan Strasser wrote:

...

Am Saturday 09 January 2010 13:22:23 schrieb Andrey Semashev:

I see. but while that is a good reason to have a std::map (or unordered_map) in boost, I still don't get why you can't use thread local storage for saving the pointer to the map (i.e. the result of get_current_thread_data()).

I can't say I understand all the details of the current implementation, but in case pthread_setspecific etc. needs to be kept for some reason related to construction/destruction, it seems to me at the very least the result of get_current_thread_data() could be cached in a thread-local variable.

The pointer to the thread-specific data is stored in TLS, last time I checked. And there's no difference in terms of performance in how you store the pointer - via __thread specifier or by manually calling pthread APIs. In case of __thread the compiler will generate the necessary calls to pthread for you, that all you win, AFAIU.

...

that would speed up thread_specific_ptr::get() 5-fold according to my profiling.

Hmm, that's unexpected. Could you post a patch to test with?

Stefan Strasser

11:57 a.m.

New subject: [thread] thread_specific_ptr performance

Am Saturday 09 January 2010 13:58:35 schrieb Andrey Semashev:

...

The pointer to the thread-specific data is stored in TLS, last time I checked. And there's no difference in terms of performance in how you store the pointer - via __thread specifier or by manually calling pthread APIs. In case of __thread the compiler will generate the necessary calls to pthread for you, that all you win, AFAIU.

how do you come to that conclusion? << __thread int *ptr; int *get_current_thread_data(){ return ptr; }

...

...

<< _Z23get_current_thread_datav: .LFB2: pushl %ebp .LCFI0: movl %gs:ptr@NTPOFF, %eax movl %esp, %ebp .LCFI1: popl %ebp ret

...

...

boost's find_tss_data on the other hand: (kcachegrind doesn't let me copy that table, so here is a hand-typed copy of the most important calls) find_tss_data inkl. - exkl. - function 77% - 10% - get_current_thread_data 50% - 15% - boost::call_once 33% - 13% - get_once_per_thread_epoch 31% - 28% - pthread_getspecific

...

...
that would speed up thread_specific_ptr::get() 5-fold according to my profiling.

Hmm, that's unexpected. Could you post a patch to test with?

I don't have a patch, it was an estimation based on the profile above.

Andrey Semashev

1:55 p.m.

New subject: [thread] thread_specific_ptr performance

On 01/09/2010 02:57 PM, Stefan Strasser wrote:

...

Am Saturday 09 January 2010 13:58:35 schrieb Andrey Semashev:

...
The pointer to the thread-specific data is stored in TLS, last time I checked. And there's no difference in terms of performance in how you store the pointer - via __thread specifier or by manually calling pthread APIs. In case of __thread the compiler will generate the necessary calls to pthread for you, that all you win, AFAIU.

how do you come to that conclusion?

<< __thread int *ptr;

int *get_current_thread_data(){ return ptr; }

...
...
<< _Z23get_current_thread_datav: .LFB2: pushl %ebp .LCFI0: movl %gs:ptr@NTPOFF, %eax movl %esp, %ebp .LCFI1: popl %ebp ret

...
...

Interesting. It seems that GCC is smarter than I thought. At least on x86. However, __thread doesn't allow to register a cleanup function that is required by Boost.Thread. You could create a dummy TLS with pthread API for that, but then I'm not sure if you could reliably access __thread variables from that function.

Andrey Semashev

2:01 p.m.

New subject: [thread] thread_specific_ptr performance

On 01/09/2010 04:55 PM, Andrey Semashev wrote:

...

On 01/09/2010 02:57 PM, Stefan Strasser wrote:

...
Am Saturday 09 January 2010 13:58:35 schrieb Andrey Semashev: However, __thread doesn't allow to register a cleanup function that is required by Boost.Thread. You could create a dummy TLS with pthread API for that, but then I'm not sure if you could reliably access __thread variables from that function.

On the second thought, yes, it seems quite possible to store the pointer in both TLSs - the pthread and __thread one, and register the cleanup for the former. Good point you have!

Stefan Strasser

11 Jan 11 Jan

2:22 p.m.

New subject: [thread] thread_specific_ptr performance

Am Saturday 09 January 2010 12:57:33 schrieben Sie:

...

<< __thread int *ptr;

int *get_current_thread_data(){ return ptr; }

<< _Z23get_current_thread_datav: .LFB2: pushl %ebp .LCFI0: movl %gs:ptr@NTPOFF, %eax movl %esp, %ebp .LCFI1: popl %ebp ret

I've tried the same thing on MSVC with declspec(thread) and the result is similar. even though I understand the need for it now, it still seems odd that we use a std::map to simply access a value in another memory segment. have you thought about dynamically allocating an index into a (thread-specific) vector instead of a key based on the thread_specific_ptr's address? IIUC that would allow constant-time access in all cases. there are some similarities to Boost.Interprocess shared memory segments, the dynamically allocated index would sort-of be an offset_ptr into a "thread specific segment". with the difference that the objects themselves can be stored outside of the segment (which makes "growing the segment" a lot easier) and that a "thread-specific segment" must be duplicated when a new thread is started (or on first access). but this seems solvable and if the allocator is able to not only store destruction but also construction information, thread specific instances could be supported in addition to thread specific pointers.

Andrey Semashev

7:12 p.m.

New subject: [thread] thread_specific_ptr performance

On 01/11/2010 05:22 PM, Stefan Strasser wrote:

...

even though I understand the need for it now, it still seems odd that we use a std::map to simply access a value in another memory segment.

have you thought about dynamically allocating an index into a (thread-specific) vector instead of a key based on the thread_specific_ptr's address? IIUC that would allow constant-time access in all cases.

Yes, that's what I did in my patch in ticket #2361. One inconvenience with that approach is to restrain the size of the vector if thread_specific_ptrs are constantly created/destroyed. I did not solve it in my patch but it should be quite doable. I really hope Anthony will take a look at it and come up with a vector-based solution. Or I can finish the patch myself, if it has any chance of getting into SVN. Anthony?

James Mansion

8:24 p.m.

New subject: [thread] thread_specific_ptr performance

...

Yes, that's what I did in my patch in ticket #2361. One inconvenience with that approach is to restrain the size of the vector if thread_specific_ptrs are constantly created/destroyed. I did not solve it in my patch but it should be quite doable.

I really hope Anthony will take a look at it and come up with a vector-based solution. Or I can finish the patch myself, if it has any chance of getting into SVN. Anthony? I'm just in the process of doing something much the same myself. I don't

Andrey Semashev wrote: think limiting the number of slots is really an issue in practice if you can allow a 4k page for each thread. There's no real need for each library to use more than one slot. James

Anthony Williams

12 Jan 12 Jan

8:38 a.m.

New subject: [thread] thread_specific_ptr performance

Andrey Semashev <andrey.semashev@gmail.com> writes:

...

On 01/11/2010 05:22 PM, Stefan Strasser wrote:

...
even though I understand the need for it now, it still seems odd that we use a std::map to simply access a value in another memory segment.

have you thought about dynamically allocating an index into a (thread-specific) vector instead of a key based on the thread_specific_ptr's address? IIUC that would allow constant-time access in all cases.

Yes, that's what I did in my patch in ticket #2361. One inconvenience with that approach is to restrain the size of the vector if thread_specific_ptrs are constantly created/destroyed. I did not solve it in my patch but it should be quite doable.

I really hope Anthony will take a look at it and come up with a vector-based solution. Or I can finish the patch myself, if it has any chance of getting into SVN. Anthony?

Boost 1.35 used a vector for the thread_specific_ptr data, but there were complaints about the excessive memory usage. The map version has a smaller memory footprint. It is possible that an alternative scheme (such as using a sorted vector as a map) might yield something that is more reasonable on both fronts. Anthony -- Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/ just::thread C++0x thread library http://www.stdthread.co.uk Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Vicente Botet Escriba

11:24 a.m.

New subject: [thread] thread_specific_ptr performance

Bugzilla from anthony.ajw@gmail.com wrote:

...

Andrey Semashev <andrey.semashev@gmail.com> writes:

...
On 01/11/2010 05:22 PM, Stefan Strasser wrote:

...
even though I understand the need for it now, it still seems odd that we use a std::map to simply access a value in another memory segment.

have you thought about dynamically allocating an index into a (thread-specific) vector instead of a key based on the thread_specific_ptr's address? IIUC that would allow constant-time access in all cases.

Yes, that's what I did in my patch in ticket #2361. One inconvenience with that approach is to restrain the size of the vector if thread_specific_ptrs are constantly created/destroyed. I did not solve it in my patch but it should be quite doable.

I really hope Anthony will take a look at it and come up with a vector-based solution. Or I can finish the patch myself, if it has any chance of getting into SVN. Anthony?

Boost 1.35 used a vector for the thread_specific_ptr data, but there were complaints about the excessive memory usage. The map version has a smaller memory footprint.

It is possible that an alternative scheme (such as using a sorted vector as a map) might yield something that is more reasonable on both fronts.

Anthony

I would provide the thread_specific_ptr interface at several levels: * 1st: using the specific 3pp/OS libraries interface (the number of instances are limited by the 3pp/OS. The cost of this depends on the 3pp, but usually is constant. The key need usualy less that 16bits. * 2nd: the library could use a fixed size array allowing to extend the number of TSS instances. The size of this direct access could be not too big (256). The cost of this is the cost of 1st + one indirection + constant access to the array. The key need usually less that 16bits. * 3rd: the library could use a dynamic size map as now allowing to avoid limits. The cost of this is the cost of 1st + one indirection + log(N) access to the map. If the key is the address the key spends sizeof(void*). The 1st level should be reserved to generic libraries that are could be used by any application and with a high frequency usage of the get function. I'm thinking for example to the current transaction on a transaction based application, the current log, ... We can also add another that could try on best level starting from a given one: try the 1st level, if no 3pp/OS key is available it continue with the 2nd level, and if not available continue with the 3rd level. I'm sure we could store these keys on a 32bits for 32 bits machines. We need just two bits to indicate the kind of key. 0 1st level 1 2nd level 2 3rd level Just my 2cts Vicente -- View this message in context: http://old.nabble.com/Boost-library-submission-%28poll-for-interest%29-tp270... Sent from the Boost - Dev mailing list archive at Nabble.com.

strasser＠uni-bremen.de

1:15 p.m.

New subject: [thread] thread_specific_ptr performance

Zitat von Anthony Williams <anthony.ajw@gmail.com>:

...

...
I really hope Anthony will take a look at it and come up with a vector-based solution. Or I can finish the patch myself, if it has any chance of getting into SVN. Anthony?

Boost 1.35 used a vector for the thread_specific_ptr data, but there were complaints about the excessive memory usage. The map version has a smaller memory footprint.

1.35 also uses the thread_specific_ptr´s address as key, but searches for it in a list. and as far as I can see nodes of that list are not reused, which is probably the reason for the complaints. what we meant above was allocating an index on thread_specific_ptr construction, not using the address as a key into a vector. those indexes could be reused as andrey indicated by maintaining a free-list, so there is no excessive memory usage. Zitat von Vicente Botet Escriba <vicente.botet@wanadoo.fr>:

...

I would provide the thread_specific_ptr interface at several levels:

I don't see why that would be necessary. the vector can be reallocated at any time without a mutex since it is thread-specific, so your "2nd level" can be used for an unlimited number of thread_specific_ptrs. thread_specific_ptr constructor: mutex lock, either getting an index from the free-list or using end of vector. constant-time. thread_specific_ptr destructor: mutex lock, add index to free-list. constant-time. thread_specific_ptr operator*: one branch to make sure the vector is large enough(a new thread_specific_ptr might have been created by another thread), one indirection. constant-time average, linear to vector if reallocation is necessary. but that can only happen when a new thread_specific_ptr was created. I think the branch also could be avoided with some effort and a second indirection(using pages to avoid reallocating and making sure the page exists in each thread on thread_specific_ptr construction) but to me the branch is acceptable.

Vicente Botet Escriba

2:17 p.m.

New subject: [thread] thread_specific_ptr performance

Stefan Strasser-2 wrote:

...

Zitat von Anthony Williams <anthony.ajw@gmail.com>:

...
...
I really hope Anthony will take a look at it and come up with a vector-based solution. Or I can finish the patch myself, if it has any chance of getting into SVN. Anthony?

Boost 1.35 used a vector for the thread_specific_ptr data, but there were complaints about the excessive memory usage. The map version has a smaller memory footprint.

1.35 also uses the thread_specific_ptr´s address as key, but searches for it in a list. and as far as I can see nodes of that list are not reused, which is probably the reason for the complaints.

what we meant above was allocating an index on thread_specific_ptr construction, not using the address as a key into a vector. those indexes could be reused as andrey indicated by maintaining a free-list, so there is no excessive memory usage.

Zitat von Vicente Botet Escriba <vicente.botet@wanadoo.fr>:

...
I would provide the thread_specific_ptr interface at several levels:

I don't see why that would be necessary.

1st level provide the best performances that can be needed in some contexts. Stefan Strasser-2 wrote:

...

the vector can be reallocated at any time without a mutex since it is thread-specific, so your "2nd level" can be used for an unlimited number of thread_specific_ptrs.

thread_specific_ptr constructor: mutex lock, either getting an index from the free-list or using end of vector. constant-time.

thread_specific_ptr destructor: mutex lock, add index to free-list. constant-time.

thread_specific_ptr operator*: one branch to make sure the vector is large enough(a new thread_specific_ptr might have been created by another thread), one indirection. constant-time average, linear to vector if reallocation is necessary. but that can only happen when a new thread_specific_ptr was created.

I think the branch also could be avoided with some effort and a second indirection(using pages to avoid reallocating and making sure the page exists in each thread on thread_specific_ptr construction) but to me the branch is acceptable.

For me it is unacceptable to use reallocation of the vector on the operator*. More, any non-constant time operator* don't satisfy my requirements for some specific contexts. Vicente -- View this message in context: http://old.nabble.com/Boost-library-submission-%28poll-for-interest%29-tp270... Sent from the Boost - Dev mailing list archive at Nabble.com.

Peter Dimov

3:05 p.m.

New subject: [thread] thread_specific_ptr performance

Vicente Botet Escriba wrote:

...

For me it is unacceptable to use reallocation of the vector on the operator*.

Operator* has a non-NULL precondition and doesn't need to reallocate. reset() does, but only if you set a thread_specific_ptr that has been created after the thread has started.

...

More, any non-constant time operator* don't satisfy my requirements for some specific contexts.

I don't think that any implementation gives you this constant time guarantee if you continually create thread-specific variables.

Vicente Botet Escriba

3:25 p.m.

New subject: [thread] thread_specific_ptr performance

Peter Dimov-5 wrote:

...

Vicente Botet Escriba wrote:

...
For me it is unacceptable to use reallocation of the vector on the operator*.

Operator* has a non-NULL precondition and doesn't need to reallocate. reset() does, but only if you set a thread_specific_ptr that has been created after the thread has started.

I was responding to the proposition of Stefan, not to the current implementation. Peter Dimov-5 wrote:

...

...
More, any non-constant time operator* don't satisfy my requirements for some specific contexts.

I don't think that any implementation gives you this constant time guarantee if you continually create thread-specific variables.

When you say "continually create thread-specific variables" do you mean without removing them. In this case you will reach the limit of keys, but this seems to me a use case that has not too much sens. pthread give it for a limited number of key, isn't it? Vicente -- View this message in context: http://old.nabble.com/Boost-library-submission-%28poll-for-interest%29-tp270... Sent from the Boost - Dev mailing list archive at Nabble.com.

strasser＠uni-bremen.de

3:58 p.m.

New subject: [thread] thread_specific_ptr performance

Zitat von Vicente Botet Escriba <vicente.botet@wanadoo.fr>:

...

Peter Dimov-5 wrote:

...
Vicente Botet Escriba wrote:

...
For me it is unacceptable to use reallocation of the vector on the operator*.

Operator* has a non-NULL precondition and doesn't need to reallocate. reset() does, but only if you set a thread_specific_ptr that has been created after the thread has started.

I was responding to the proposition of Stefan, not to the current implementation.

peter´s right, no reallocation in operator*. thread_specific_ptr::operator*(){ BOOST_ASSERT(tss_vec.size() > this->index); return *tss_vec[this->index]; } thread_specific_ptr::get(){ if(tss_vec.size() > this->index) return tss_vec[this->index]; else return 0; } thread_specific_ptr::reset(T *ptr){ auto_ptr aptr(ptr); if(tss_vec.size() <= this->index) tss_vec.resize(this->index+1); tss_vec[this->index]=aptr.release(); }

...

My concrete example is to access the current transaction on a Software Transaction Memory. This operation can be required frequently. You should

but reallocation only happens once per thread, on first call to reset().

...

I have no access to the code now. Please,could you show where the current implementation allocates and use a mutex on the operator*.

probably also only on reset(), the code is in libs/src/pthread/thread.cpp. it inserts an element into a std::map, which allocates, which acquires a mutex.

...

A library could not know if its users work on a Boost.Thread or on a native thread. So thread_specific_ptr needs to work in both cases. The current implementation of thread_specific_ptr satisfy already this. Any alternative proposal need to satisfy this requirement also.

ok, I wasn´t sure.

vicente.botet

8:10 p.m.

New subject: [thread] thread_specific_ptr performance

----- Original Message ----- From: <strasser@uni-bremen.de> To: <boost@lists.boost.org> Sent: Tuesday, January 12, 2010 4:58 PM Subject: Re: [boost] [thread] thread_specific_ptr performance Zitat von Vicente Botet Escriba <vicente.botet@wanadoo.fr>:

...

Peter Dimov-5 wrote:

...
Vicente Botet Escriba wrote:

...
For me it is unacceptable to use reallocation of the vector on the operator*.

Operator* has a non-NULL precondition and doesn't need to reallocate. reset() does, but only if you set a thread_specific_ptr that has been created after the thread has started.

I was responding to the proposition of Stefan, not to the current implementation.

peter´s right, no reallocation in operator*. <snip code>

...

My concrete example is to access the current transaction on a Software Transaction Memory. This operation can be required frequently. You should

but reallocation only happens once per thread, on first call to reset().

...

I have no access to the code now. Please,could you show where the current implementation allocates and use a mutex on the operator*.

probably also only on reset(), the code is in libs/src/pthread/thread.cpp. it inserts an element into a std::map, which allocates, which acquires a mutex.

...

A library could not know if its users work on a Boost.Thread or on a native thread. So thread_specific_ptr needs to work in both cases. The current implementation of thread_specific_ptr satisfy already this. Any alternative proposal need to satisfy this requirement also.

ok, I wasn´t sure. _______________________________________________ Stefan, I have never talk about reset function but the operator*. Just quoting yourself "thread_specific_ptr operator*: one branch to make sure the vector is large enough(a new thread_specific_ptr might have been created by another thread), one indirection. constant-time average, linear to vector if reallocation is necessary. but that can only happen when a new thread_specific_ptr was created."

...

From this I deduce that the operator* you were proposing did a reallocation. Please don't mix allocation on the map and reallocation of the vector you were proposing.

Vicente

Stefan Strasser

9:31 p.m.

New subject: [thread] thread_specific_ptr performance

Am Tuesday 12 January 2010 21:10:57 schrieb vicente.botet:

...

Stefan, I have never talk about reset function but the operator*. Just quoting yourself "thread_specific_ptr operator*: one branch to make sure the vector is large enough(a new thread_specific_ptr might have been created by another thread), one indirection. constant-time average, linear to vector if reallocation is necessary. but that can only happen when a new thread_specific_ptr was created."

vicente.botet

8:39 p.m.

New subject: [thread] thread_specific_ptr performance

----- Original Message ----- From: "Stefan Strasser" <strasser@uni-bremen.de> To: <boost@lists.boost.org> Sent: Tuesday, January 12, 2010 10:31 PM Subject: Re: [boost] [thread] thread_specific_ptr performance Am Tuesday 12 January 2010 21:10:57 schrieb vicente.botet:

...

Stefan, I have never talk about reset function but the operator*. Just quoting yourself "thread_specific_ptr operator*: one branch to make sure the vector is large enough(a new thread_specific_ptr might have been created by another thread), one indirection. constant-time average, linear to vector if reallocation is necessary. but that can only happen when a new thread_specific_ptr was created."

like I said, "peter´s right, no reallocation in operator*.". I should have differentiated between get(), reset() and operator* in that description. the fact is that only reset() needs to reallocate, and the discussion of possibly changing the implementation of thread_specific_ptr should be continued on that basis, no matter what I mistakenly wrote before. sorry for the confusion. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Doesn't matter. In order to follow the discussion do you agree to post a clear proposal using your pages design? Best, Vicente

Stefan Strasser

13 Jan 13 Jan

2:08 p.m.

New subject: thread_specific_ptr prototype (was: Re: [thread] thread_specific_ptr performance)

Am Tuesday 12 January 2010 21:39:16 schrieb vicente.botet:

...

In order to follow the discussion do you agree to post a clear proposal using your pages design?

the "pages" proposal was a result of our misunderstanding, I think a vector-based implementation is sufficient. here's a prototype implementation: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=tss.hpp&directory=& it implements all functions of a thread_specific_ptr except for custom cleanup functions (for now). below is the result of operator*. thread_specific_ptr time complexities: constructor: constant destructor: constant average, occasional vector reallocation operator*: constant operator->: constant release(): constant get(): constant reset(): undefined. operations by the tss code itself are constant average, but it may call a user-supplied destructor or pthread functions. it also fixes the bug I posted in a previous mail. << prototype::thread_specific_ptr<int> ptr; int f(){ return *ptr; }

...

...

<< _Z1fv: .LFB4216: pushl %ebp .LCFI26: movl %gs:_ZN5boost9prototype6detail10tss_vectorE@NTPOFF, %edx movl ptr+4, %eax movl %esp, %ebp .LCFI27: popl %ebp movl (%edx), %edx leal (%eax,%eax,2), %eax movl (%edx,%eax,4), %eax movl (%eax), %eax ret

...

...

Stefan Strasser

2:28 p.m.

New subject: thread_specific_ptr prototype (was: Re: [thread] thread_specific_ptr performance)

Am Wednesday 13 January 2010 15:08:29 schrieben Sie:

...

Am Tuesday 12 January 2010 21:39:16 schrieb vicente.botet: << prototype::thread_specific_ptr<int> ptr; int f(){ return *ptr; }

<< _Z1fv: .LFB4216: pushl %ebp .LCFI26: movl %gs:_ZN5boost9prototype6detail10tss_vectorE@NTPOFF, %edx movl ptr+4, %eax movl %esp, %ebp .LCFI27: popl %ebp movl (%edx), %edx leal (%eax,%eax,2), %eax movl (%edx,%eax,4), %eax movl (%eax), %eax ret

one of the indirections could be removed by storing &vector[0] in the thread-local storage instead of &vector.

Peter Dimov

12 Jan 12 Jan

4:19 p.m.

New subject: [thread] thread_specific_ptr performance

Vicente Botet Escriba wrote:

...

...
...
More, any non-constant time operator* don't satisfy my requirements for some specific contexts.

I don't think that any implementation gives you this constant time guarantee if you continually create thread-specific variables.

When you say "continually create thread-specific variables" do you mean without removing them.

Stefan's idea was to reuse slots in the vector. This would avoid reallocation when keys are created and destroyed at the same rate. Unfortunately, on second thought, this isn't as straightforward as it seemed; threads could still hold non-NULL pointers when the key is destroyed. pthread_key_delete simply discards these non-NULL pointers without cleaning them up; the current implementation performs proper cleanup when the threads end.

strasser＠uni-bremen.de

5:21 p.m.

New subject: [thread] thread_specific_ptr performance

Zitat von Peter Dimov <pdimov@pdimov.com>:

...

Stefan's idea was to reuse slots in the vector. This would avoid reallocation when keys are created and destroyed at the same rate. Unfortunately, on second thought, this isn't as straightforward as it seemed; threads could still hold non-NULL pointers when the key is destroyed.

pthread_key_delete simply discards these non-NULL pointers without cleaning them up; the current implementation performs proper cleanup when the threads end.

so the current implementation destroys the thread specific objects on thread exit even though the thread specific pointer it belonged to was destroyed before? I guess this was the reason to introduce a "key" in the first place? a similar behaviour could be implemented using the vector-technique. the key would have to be stored in the vector at the allocated index along with the pointer so thread_specific_ptr::get() can decide if the pointer is left-over from a previous thread_specific_ptr or an actual pointer belonging to this thread_specific_ptr, and reset() can destroy it.

Peter Dimov

5:36 p.m.

New subject: [thread] thread_specific_ptr performance

strasser@uni-bremen.de wrote:

...

I guess this was the reason to introduce a "key" in the first place? a similar behaviour could be implemented using the vector-technique. the key would have to be stored in the vector at the allocated index along with the pointer so thread_specific_ptr::get() can decide if the pointer is left-over from a previous thread_specific_ptr or an actual pointer belonging to this thread_specific_ptr, and reset() can destroy it.

This works (at the expense of doubling the vector size, but it would still beat a map). You'd also need the old cleanup function though.

Stefan Strasser

8:23 p.m.

New subject: [thread] thread_specific_ptr performance

Am Tuesday 12 January 2010 18:36:43 schrieb Peter Dimov:

...

strasser@uni-bremen.de wrote:

...
I guess this was the reason to introduce a "key" in the first place? a similar behaviour could be implemented using the vector-technique. the key would have to be stored in the vector at the allocated index along with the pointer so thread_specific_ptr::get() can decide if the pointer is left-over from a previous thread_specific_ptr or an actual pointer belonging to this thread_specific_ptr, and reset() can destroy it.

This works (at the expense of doubling the vector size, but it would still beat a map). You'd also need the old cleanup function though.

the use of keys in the current implementation is incorrect anyway I think. a unique key that is obtained in the thread_specific_ptr constructor should be used instead. the bug can be triggered by a thread_specific_ptr being allocated at the same address there already was another one: (adding synchronization to this doesn't make a difference) optional< thread_specific_ptr<int> > optr; void other_thread(){ optr=none; optr=in_place(); } int main(){ optr=in_place(); optr->reset(new int); thread(bind(&other_thread)); this_thread::sleep(posix_time::seconds(5)); assert(optr->get() == 0); //fails! } the assertion should not fail.

James Mansion

8:42 p.m.

New subject: [thread] thread_specific_ptr performance

Peter Dimov wrote:

...

I don't think that any implementation gives you this constant time guarantee if you continually create thread-specific variables. Why does any application, ever, need to even be allowed to 'contonually create thread-specific variables' without being considered badly broken?

Thread-specific variables are a proxy for partitioned global data - and that doesn't get created and destroyed except with DLL loading and unloading. And DLL unloading somewhat extremely dodgy in C++ as it is. If you have object instances (say, some sort of rule interpreter) which wants to have thread-specific state then really its better handled with a thread-specific map between the object instance address and its state instance. James

Peter Dimov

13 Jan 13 Jan

1:34 p.m.

New subject: [thread] thread_specific_ptr performance

James Mansion wrote:

...

Peter Dimov wrote:

...
I don't think that any implementation gives you this constant time guarantee if you continually create thread-specific variables.

...

Why does any application, ever, need to even be allowed to 'contonually create thread-specific variables' without being considered badly broken? Thread-specific variables are a proxy for partitioned global data - and that doesn't get created and destroyed except with DLL loading and unloading. And DLL unloading somewhat extremely dodgy in C++ as it is.

__thread, __declspec(thread) and C++0x thread_local variables are indeed restricted to global scope. But boost::thread_specific_ptr isn't, and people have been using it as a class member. I can see nothing inherently wrong with such an use, even though it isn't "conventional".

...

If you have object instances (say, some sort of rule interpreter) which wants to have thread-specific state then really its better handled with a thread-specific map between the object instance address and its state instance.

This is basically what thread_specific_ptr currently does.

Stefan Strasser

12 Jan 12 Jan

3:33 p.m.

New subject: [thread] thread_specific_ptr performance

Am Tuesday 12 January 2010 15:17:05 schrieb Vicente Botet Escriba:

...

...
thread_specific_ptr operator*: one branch to make sure the vector is large enough(a new thread_specific_ptr might have been created by another thread), one indirection. constant-time average, linear to vector if reallocation is necessary. but that can only happen when a new thread_specific_ptr was created.

I think the branch also could be avoided with some effort and a second indirection(using pages to avoid reallocating and making sure the page exists in each thread on thread_specific_ptr construction) but to me the branch is acceptable.

For me it is unacceptable to use reallocation of the vector on the operator*. More, any non-constant time operator* don't satisfy my requirements for some specific contexts.

I'd prefer reallocation, but reallocation can be avoided using pages at the cost of a second or third indirection. struct page{ user_ptr ptr[0x10000]; }; page *pages[0x10000]; operator*(){ size_t pagenr=this->index & 0xffff0000]; page *p=pages[pagenr]; if(!p) p=pages[pagenr]=new page; return p[this->index & 0xffff]; } if 64K*sizeof(user_ptr) per page is too much that can be reduced by setting a reasonable maximum below 4 billion or by using a third indirection. could you ellaborate on your case that can't accept reallocation? I can't think of a case. as long as there is an allocation (and there is one in the current implementation, too) there's a mutex lock and no guarantee on the time it takes to return anyway.

Stefan Strasser

3:36 p.m.

New subject: [thread] thread_specific_ptr performance

Am Tuesday 12 January 2010 16:33:31 schrieben Sie:

...

Am Tuesday 12 January 2010 15:17:05 schrieb Vicente Botet Escriba:

...
...
thread_specific_ptr operator*: one branch to make sure the vector is large enough(a new thread_specific_ptr might have been created by another thread), one indirection. constant-time average, linear to vector if reallocation is necessary. but that can only happen when a new thread_specific_ptr was created.

I think the branch also could be avoided with some effort and a second indirection(using pages to avoid reallocating and making sure the page exists in each thread on thread_specific_ptr construction) but to me the branch is acceptable.

For me it is unacceptable to use reallocation of the vector on the operator*. More, any non-constant time operator* don't satisfy my requirements for some specific contexts.

I'd prefer reallocation, but reallocation can be avoided using pages at the cost of a second or third indirection.

plus, as I've said it could be possible to do the work on construction of a thread_specific_ptr for all threads so there is no branch in operator*. however, that depends on if the boost thread API can be used together with other thread APIs, because then the pages must be initialized on start of a new thread. so only boost threads could access thread_specific_ptrs, accessing a thread_specific_ptr from a natively created thread would fail.

Vicente Botet Escriba

3:48 p.m.

New subject: [thread] thread_specific_ptr performance

Stefan Strasser-2 wrote:

...

Am Tuesday 12 January 2010 16:33:31 schrieben Sie:

...
...
...
thread_specific_ptr operator*: one branch to make sure the vector is large enough(a new thread_specific_ptr might have been created by another thread), one indirection. constant-time average, linear to vector if reallocation is necessary. but that can only happen when a new thread_specific_ptr was created.

I think the branch also could be avoided with some effort and a second indirection(using pages to avoid reallocating and making sure the

Am Tuesday 12 January 2010 15:17:05 schrieb Vicente Botet Escriba: page

...
...
exists in each thread on thread_specific_ptr construction) but to me the branch is acceptable.

For me it is unacceptable to use reallocation of the vector on the operator*. More, any non-constant time operator* don't satisfy my requirements for some specific contexts.

I'd prefer reallocation, but reallocation can be avoided using pages at the cost of a second or third indirection.

plus, as I've said it could be possible to do the work on construction of a thread_specific_ptr for all threads so there is no branch in operator*.

I have not said nothing about that. I want just a constant time operator*().

...

however, that depends on if the boost thread API can be used together with other thread APIs, because then the pages must be initialized on start of a new thread. so only boost threads could access thread_specific_ptrs, accessing a thread_specific_ptr from a natively created thread would fail.

A library could not know if its users work on a Boost.Thread or on a native thread. So thread_specific_ptr needs to work in both cases. The current implementation of thread_specific_ptr satisfy already this. Any alternative proposal need to satisfy this requirement also. Vicente -- View this message in context: http://old.nabble.com/Boost-library-submission-%28poll-for-interest%29-tp270... Sent from the Boost - Dev mailing list archive at Nabble.com.

Vicente Botet Escriba

3:40 p.m.

New subject: [thread] thread_specific_ptr performance

Stefan Strasser-2 wrote:

...

Am Tuesday 12 January 2010 15:17:05 schrieb Vicente Botet Escriba:

...
...
thread_specific_ptr operator*: one branch to make sure the vector is large enough(a new thread_specific_ptr might have been created by another thread), one indirection. constant-time average, linear to vector if reallocation is necessary. but that can only happen when a new thread_specific_ptr was created.

I think the branch also could be avoided with some effort and a second indirection(using pages to avoid reallocating and making sure the page exists in each thread on thread_specific_ptr construction) but to me the branch is acceptable.

For me it is unacceptable to use reallocation of the vector on the operator*. More, any non-constant time operator* don't satisfy my requirements for some specific contexts.

I'd prefer reallocation, but reallocation can be avoided using pages at the cost of a second or third indirection.

struct page{ user_ptr ptr[0x10000]; };

page *pages[0x10000];

operator*(){ size_t pagenr=this->index & 0xffff0000]; page *p=pages[pagenr]; if(!p) p=pages[pagenr]=new page; return p[this->index & 0xffff]; }

if 64K*sizeof(user_ptr) per page is too much that can be reduced by setting a reasonable maximum below 4 billion or by using a third indirection.

This implementation don't suffer of linear or logarithmic complexity a far as the page allocations is linear. An implementation based on pages could satisfy my requirements.

...

could you ellaborate on your case that can't accept reallocation? I can't think of a case. as long as there is an allocation (and there is one in the current implementation, too) there's a mutex lock and no guarantee on the time it takes to return anyway.

My concrete example is to access the current transaction on a Software Transaction Memory. This operation can be required frequently. You should have this also on your Persistent library. IMO, the access to the current transaction must have a linear complexity. I have no access to the code now. Please,could you show where the current implementation allocates and use a mutex on the operator*. Best, Vicente Vicente -- View this message in context: http://old.nabble.com/Boost-library-submission-%28poll-for-interest%29-tp270... Sent from the Boost - Dev mailing list archive at Nabble.com.

Rutger ter Borg

9 Jan 9 Jan

11:48 a.m.

New subject: [thread] thread_specific_ptr performance (was: Re: Boost library submission (poll for interest))

Stefan Strasser wrote:

...

implementing this requires a call to thread_specific_ptr::get() on each operation to obtain the active transaction.

unfortunately thread_specific_ptr is implemented using pthread calls and a std::map lookup, so this consumes > 6% CPU in one of my test cases. and this is a real world test case with other expensive stuff, in cases that e.g. only read cached objects it's probably even worse.

is there any chance for a thread_specific_ptr implementation based on GCC __thread and MSVC __declspec(thread)?

__thread results in a simple read access using a thread-specific memory segment.

Isn't this under the assumption that such a container should be thread-safe by default? Isn't that requirement orthogonal to the semantics of transactions? FWIW, Asio's io_service also suffers from the computational cost of thread_specific_ptr. Would be great if those costs could be reduced. Cheers, Rutger

Andrey Semashev

12:34 p.m.

New subject: [thread] thread_specific_ptr performance

On 01/09/2010 02:48 PM, Rutger ter Borg wrote:

...

Stefan Strasser wrote:

FWIW, Asio's io_service also suffers from the computational cost of thread_specific_ptr. Would be great if those costs could be reduced.

I was concerned with thread_specific_ptr performance, too. Although this was in pre-1.41 times, I still think there's room for improvement. I crafted a patch for 1.40 that provided constant-time complexity of reading/writing to TLS instead of logarithmic that is present now in 1.41. It has its downsides but if anyone finds it interesting, it's attached to this ticket: https://svn.boost.org/trac/boost/ticket/2361

Brian Ravnsgaard Riis

14 Jan 14 Jan

12:49 a.m.

Bob Walters <bob.s.walters <at> gmail.com> writes:

...

I have a library I would like to submit for inclusion in Boost. This message is just soliciting for interest per the submission process. If interest is expressed, I'll carry on with a preliminary submission.

[snip] Sorry for coming late to the discussion. Been out of touch (and without internet) for quite a while. Anyway... Let it be said now that I know very little about the internals of DBs of any kind really, so I've skimmed your docs as a potential user. I am really rather excited about the prospects here; I've been looking for a "database engine" for C++ for some time, but so far just managed with SQLite (which is not intended as criticism in any way). One thing that immediately jumped at me, though, is this construct: Transaction *txn = db.beginTransaction(); { ... } db.commit(txn);

...

From an exception safety POV: What happens if db.commit(txn) is never called here? The raw Transaction pointer raises my hackles immediately. Blame it on Stroustrup and Meyers! :-) Am I missing something here?

As an aside, the quick start guide seems rather longish to accomplish very little, but as you have to go around the Boost.Interprocess part that is probably unavoidable (and you *do* have to go around that part, I think).

...

Let me know if there's any interest, and I'll work on a preliminary submission for the sandbox.

Consider me interested! :-) /Brian Riis

mbiddeg＠mtn.co.ug

11:35 a.m.

...

From an exception safety POV: What happens if db.commit(txn) is never called here? The raw Transaction pointer raises my hackles immediately.

Why not just use: std::unique_ptr<Transaction> txn(db.beginTransaction()); { ... } db.commit(*txn); //db.commit(txn.get());// ______________________________________________________________________ Kizza George Mbidde Interconnect Billing Systems Analyst IT cell: +256 77 212 0982 email: mbiddeg@mtn.co.ug -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Brian Ravnsgaard Riis Sent: January 14, 2010 03:50 To: boost@lists.boost.org Subject: Re: [boost] Boost library submission (poll for interest) Bob Walters <bob.s.walters <at> gmail.com> writes:

...

I have a library I would like to submit for inclusion in Boost. This message is just soliciting for interest per the submission process. If interest is expressed, I'll carry on with a preliminary submission.

...

From an exception safety POV: What happens if db.commit(txn) is never called here? The raw Transaction pointer raises my hackles immediately. Blame it on Stroustrup and Meyers! :-) Am I missing something here?

...

Let me know if there's any interest, and I'll work on a preliminary submission for the sandbox.

Consider me interested! :-) /Brian Riis _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Brian Ravnsgaard Riis

15 Jan 15 Jan

6:44 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 mbiddeg@mtn.co.ug skrev:

...

...
From an exception safety POV: What happens if db.commit(txn) is never called here? The raw Transaction pointer raises my hackles immediately.

Why not just use:

std::unique_ptr<Transaction> txn(db.beginTransaction()); { ... } db.commit(*txn); //db.commit(txn.get());//

That'd be a start, but I don't think it should be the user's job to do that "security". Factory functions that return raw pointers are too easy to misuse. The object db could return an auto_ptr (or unique_ptr, or...) instead of a raw pointer, and we'd immediately be better off. /Brian -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAktQDrgACgkQk1tAOprY6QGh3QCg3lhEA6K6MiBxgOqRW2/QJJJl 4OQAoLI2LBsdznGVif4Ehohm/0P5+Xce =yaeC -----END PGP SIGNATURE-----

Bob Walters

4:06 a.m.

On Wed, Jan 13, 2010 at 7:49 PM, Brian Ravnsgaard Riis <brian@ravnsgaard.net> wrote:

...

One thing that immediately jumped at me, though, is this construct:

Transaction *txn = db.beginTransaction(); { ... } db.commit(txn);

...
From an exception safety POV: What happens if db.commit(txn) is never called here? The raw Transaction pointer raises my hackles immediately. Blame it on Stroustrup and Meyers! :-) Am I missing something here?

No. I should have at least used an auto there to ensure destruction. Actually the whole convention of the "database as factory" for transactions is something I'm going to rework so that you can have the transaction on the stack if you so choose. I do think I want to continue to support explicit transactions, but am very tempted by something Stefan is doing in Boost.Persistent, in which transaction scope is retained in thread-specific memory (I assume) and passed around implicitly. Support for that as an optional approach is something I need to do for those who prefer that.

...

Consider me interested! :-)

Thanks. I'm going through a rewrite of some of the checkpoint logic. Once that's confirmed as working, there should be a tarball available.

Brian Ravnsgaard Riis

6:35 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bob Walters skrev:

...

...
...
From an exception safety POV: What happens if db.commit(txn) is never called here? The raw Transaction pointer raises my hackles immediately. Blame it on Stroustrup and Meyers! :-) Am I missing something here?

No. I should have at least used an auto there to ensure destruction. Actually the whole convention of the "database as factory" for transactions is something I'm going to rework so that you can have the transaction on the stack if you so choose. I do think I want to continue to support explicit transactions, but am very tempted by something Stefan is doing in Boost.Persistent, in which transaction scope is retained in thread-specific memory (I assume) and passed around implicitly. Support for that as an optional approach is something I need to do for those who prefer that.

I can't wait to see that interface.

...

...
Consider me interested! :-)

Thanks. I'm going through a rewrite of some of the checkpoint logic. Once that's confirmed as working, there should be a tarball available.

Great! Thanks. /Brian -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAktQDK8ACgkQk1tAOprY6QGj/gCdFhjRFvVNpEYFxznc80+12rip y94AoJfnaezKPKzrgsxlvEIBYm7H5PKq =pjsW -----END PGP SIGNATURE-----

Vicente Botet Escriba

12:50 p.m.

Bob Walters-2 wrote:

...

On Wed, Jan 13, 2010 at 7:49 PM, Brian Ravnsgaard Riis <brian@ravnsgaard.net> wrote:

...
One thing that immediately jumped at me, though, is this construct:

Transaction *txn = db.beginTransaction(); { ... } db.commit(txn);

...
From an exception safety POV: What happens if db.commit(txn) is never called here? The raw Transaction pointer raises my hackles immediately. Blame it on Stroustrup and Meyers! :-) Am I missing something here?

No. I should have at least used an auto there to ensure destruction. Actually the whole convention of the "database as factory" for transactions is something I'm going to rework so that you can have the transaction on the stack if you so choose. I do think I want to continue to support explicit transactions, but am very tempted by something Stefan is doing in Boost.Persistent, in which transaction scope is retained in thread-specific memory (I assume) and passed around implicitly. Support for that as an optional approach is something I need to do for those who prefer that.

Hi, if you can store the transaction on the stack you could use the atomic language-like macros Stefan provides in it Persistent library. I agree that you should support transparent as well as explicit transactions. transparent transaction have a cost to access the thread specific pointer. Best, Vicente -- View this message in context: http://old.nabble.com/Boost-library-submission-%28poll-for-interest%29-tp270... Sent from the Boost - Dev mailing list archive at Nabble.com.

Stefan Strasser

10:22 a.m.

Am Friday 15 January 2010 13:50:43 schrieb Vicente Botet Escriba:

...

...
No. I should have at least used an auto there to ensure destruction. Actually the whole convention of the "database as factory" for transactions is something I'm going to rework so that you can have the transaction on the stack if you so choose. I do think I want to continue to support explicit transactions, but am very tempted by something Stefan is doing in Boost.Persistent, in which transaction scope is retained in thread-specific memory (I assume) and passed around implicitly. Support for that as an optional approach is something I need to do for those who prefer that.

Hi, if you can store the transaction on the stack you could use the atomic language-like macros Stefan provides in it Persistent library.

I agree that you should support transparent as well as explicit transactions. transparent transaction have a cost to access the thread specific pointer.

what happened to vicente's suggestion to unify this for all 3 libraries? this is how you implement a resource manager with my library, but if we do bring this together it obviously shouldn't be part of my library so there is no dependency just for that. user code: int main(){ transaction tx; mytransmap->insert(1); tx.commit(); atomic{ mytransmap->insert(2); }retry; } behind the scenes: struct stldb_tag{}; class my_resource_manager{ typedef stldb::transaction transaction; typedef stldb_tag tag; void commit_transaction(transaction &tx){ stldb::commit(tx); } void rollback_transaction(transaction &tx){ stldb::rollback(tx); } ... }; typedef basic_transaction_manager<my_resource_manager> stldb_transaction_manager; #define BOOST_PERSISTENT_CONFIGURATION stldb_transaction_manager; #include <boost/persistent/transaction_manager.hpp> #include <boost/persistent/transaction.hpp> class trans_map{ void insert(int){ stldb::transaction &tx= transaction_manager::active().resoure_transaction<stldb_tag>(); //...do something with transaction } };

5674

Age (days ago)

5685

Last active (days ago)

List overview

Download

52 comments

14 participants

participants (14)

Andrey Semashev
Anthony Williams
Bob Walters
Brian Ravnsgaard Riis
Ion Gaztañaga
James Mansion
Mathias Gaunard
mbiddeg＠mtn.co.ug
Peter Dimov
Rutger ter Borg
Stefan Strasser
strasser＠uni-bremen.de
Vicente Botet Escriba
vicente.botet