[interprocess] sharing memory between 32 bit and 64 bit Windows processes

Hello Ion, I also need this functionality badly. Is there any chance of you implementing it? Basically, using unsigned int instead of size_t and int instead of ptrdiff_t, so the binary layout of shared memory is always like in 32 bit, would in practice do the trick. If we implement it ourselves, would you support getting this into boost? I cannot promise complete support. I already saw that rbtree_best_fit has a problem because the ring list containing the memory blocks needs to wrap around from the last block to the first. Would porting Doug Lea's allocator be an option? It will be hard to beat something that has been so thoroughly used and improved. Cheers, Arno -- Dr. Arno Schödl | aschoedl@think-cell.com Technical Director think-cell Software GmbH | Chausseestr. 8/E | 10115 Berlin | Germany http://www.think-cell.com | phone +49 30 666473-10 | US phone +1 800 891 8091 Amtsgericht Berlin-Charlottenburg, HRB 85229 | European Union VAT Id DE813474306 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl

El 18/01/2011 13:02, Arno Schödl escribió:
Hello Ion,
I also need this functionality badly. Is there any chance of you implementing it? Basically, using unsigned int instead of size_t and int instead of ptrdiff_t, so the binary layout of shared memory is always like in 32 bit, would in practice do the trick.
I don't think that would work, compiler ABI might have other changes (Empty Base optimization, new alignment requirementss etc.) for 64 bit that would break everything. And I don't know the distance between offset_ptrs created in the stack (eg temporaries) and those in shared memory. If that distance (I think this depends on which address the OS reserves to shared memory) is bigger than 2GB, then you are lost.
If we implement it ourselves, would you support getting this into boost? I cannot promise complete support. I already saw that rbtree_best_fit has a problem because the ring list containing the memory blocks needs to wrap around from the last block to the first. Would porting Doug Lea's allocator be an option? It will be hard to beat something that has been so thoroughly used and improved.
If the code is good enough ;-) Porting DLMalloc is not easy, It preallocates some bins for small allocation, and it relies on growing heap memory. For shared memory this scheme is not very good because you can't grow the existing space for all processes, but I think you could adapt it. I don't understand the problem with rb_tree_best_fit. Best, Ion

Hello Ion,
I also need this functionality badly. Is there any chance of you implementing it? Basically, using unsigned int instead of size_t and int instead of ptrdiff_t, so the binary layout of shared memory is always like in 32 bit, would in practice do the trick.
I don't think that would work, compiler ABI might have other changes (Empty Base optimization, new alignment requirementss etc.) for 64 bit that would break everything. And I don't know the distance between offset_ptrs created in the stack (eg temporaries) and those in shared memory. If that distance (I think this depends on which address the OS reserves to shared memory) is bigger than 2GB, then you are lost.
Do you know for a fact that MSVC 32-bit vs. 64-bit are so different, or just in general? We had our own hand-rolled mixed-bitness implementation of shared memory where I did not see ABI differences, but we may have been just lucky. Then we switched to boost.interprocess... I still think even if there are ABI problems in certain scenarios, it is still worth getting to work whatever works. You cannot blindly throw objects into shared memory anyway. At least in Windows, modules are not guaranteed to be loaded at the same address, which breaks virtual function tables even if both processes have the same bitness. Thanks for pointing out the 2GB problem. In our old implementation, we had the base address in a global variable because we only needed a single shared pool. We could create a special offset_ptr that knows its segment manager type (by template parameter), and a special segment manager that is required to be a singleton, with a static member to its base address. This special segment manager could still have an index template parameter so the user can have more than one of them, albeit decided at compile time. I don't think that many people need a runtime-determined number of pools.
If we implement it ourselves, would you support getting this into boost? I cannot promise complete support. I already saw that rbtree_best_fit has a problem because the ring list containing the memory blocks needs to wrap around from the last block to the first. Would porting Doug Lea's allocator be an option? It will be hard to beat something that has been so thoroughly used and improved.
If the code is good enough ;-) Porting DLMalloc is not easy, It preallocates some bins for small allocation, and it relies on growing heap memory. For shared memory this scheme is not very good because you can't grow the existing space for all processes, but I think you could adapt it. I don't understand the problem with rb_tree_best_fit.
Our old implementation was a hacked DlMalloc. We reserved x MBs of virtual address space, just like managed_windows_shared_memory, and simulated sbrk within it. The rb_tree_best_fit problem is the 2GB problem in a different form. The last memory block is a block of size (address_of_first_block-address_of_last_block) to walk the ring list from the last block to the first block. Cheers, Arno -- Dr. Arno Schödl | aschoedl@think-cell.com Technical Director think-cell Software GmbH | Chausseestr. 8/E | 10115 Berlin | Germany http://www.think-cell.com | phone +49 30 666473-10 | US phone +1 800 891 8091 Amtsgericht Berlin-Charlottenburg, HRB 85229 | European Union VAT Id DE813474306 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl

Hello Ion, I think we have a possible solution: Making size_t/ptrdiff_t always 64 bit makes the same offset_ptr work fine in both environments, and solves similar problems in rb_tree_best_fit as well. We have a prototype working. Can we template the size_t/ptrdiff_t size into offset_ptr? For example: template <class PointedType, class DifferenceType=ptrdiff_t> class offset_ptr; Then all other classes could derive their size_types and difference_types from their pointer type as unsigned/signed std::iterator_traits<pointer_type>::difference_type. Is that ok with you? Cheers, Arno -- Dr. Arno Schödl | aschoedl@think-cell.com Technical Director think-cell Software GmbH | Chausseestr. 8/E | 10115 Berlin | Germany http://www.think-cell.com | phone +49 30 666473-10 | US phone +1 800 891 8091 Amtsgericht Berlin-Charlottenburg, HRB 85229 | European Union VAT Id DE813474306 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl

El 08/02/2011 17:39, Arno Schödl escribió:
Hello Ion,
I think we have a possible solution: Making size_t/ptrdiff_t always 64 bit makes the same offset_ptr work fine in both environments, and solves similar problems in rb_tree_best_fit as well. We have a prototype working.
Nice to hear it.
Can we template the size_t/ptrdiff_t size into offset_ptr? For example:
template<class PointedType, class DifferenceType=ptrdiff_t> class offset_ptr;
Then all other classes could derive their size_types and difference_types from their pointer type as unsigned/signed std::iterator_traits<pointer_type>::difference_type. Is that ok with you?
That could be reasonable, but I don't know how much code we'll need to make dependent on the pointer type. Managed memories are already dependent on the pointer type, so I guess there is not much impact. Containers and other types will be harder, in these cases we'll need to make the allocator's pointer type customizable so that the container uses the corresponding size_type/difference_type. I'll add this to my to-do list, but I'm, afraid I have higher priority issues right now in the library. Best, Ion

I think we have a possible solution: Making size_t/ptrdiff_t always 64 bit makes the same offset_ptr work fine in both environments, and solves similar problems in rb_tree_best_fit as well. We have a prototype working.
Nice to hear it.
Can we template the size_t/ptrdiff_t size into offset_ptr? For example:
template<class PointedType, class DifferenceType=ptrdiff_t> class offset_ptr;
Then all other classes could derive their size_types and difference_types from their pointer type as unsigned/signed std::iterator_traits<pointer_type>::difference_type. Is that ok with you?
That could be reasonable, but I don't know how much code we'll need to make dependent on the pointer type. Managed memories are already dependent on the pointer type, so I guess there is not much impact. Containers and other types will be harder, in these cases we'll need to make the allocator's pointer type customizable so that the container uses the corresponding size_type/difference_type.
I'll add this to my to-do list, but I'm, afraid I have higher priority issues right now in the library.
We'd implement it, as long as you are willing to review it and allow it into boost. Regards, Arno -- Dr. Arno Schödl | aschoedl@think-cell.com Technical Director think-cell Software GmbH | Chausseestr. 8/E | 10115 Berlin | Germany http://www.think-cell.com | phone +49 30 666473-10 | US phone +1 800 891 8091 Amtsgericht Berlin-Charlottenburg, HRB 85229 | European Union VAT Id DE813474306 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl

We'd implement it, as long as you are willing to review it and allow it into boost.
Don't worry, I'll review it,
Hello Ion, there is a design decision to make w.r.t. mixing 32 bit and 64 bit environments: - In a mixed environment, each shared heap can be no larger than 32 bit. - offset_ptr must have the ability to point to any place _in any of potentially many heaps_ because at the time of dereferencing it has no information about the heap it is meant to point to. This requires a heap-relative 32 bit quantity plus some "heap selector", or a this-relative 64 bit quantity. The latter is simple, fast and not significantly less compact than the former, so we pick the latter. - Option 1: For the rest of the code, the easiest option is to make everything 64 bit, e.g., deriving size_type et. al. from offset_ptr. Unfortunately, this makes all interfaces 64 bit. So even on 32 bit you can request a 64 bit shared heap. Internally, when going to the OS to allocate the heap, there is a cast to 32-bit size_t with data loss, which is ugly. - Option 2: So the interfaces should really be 32 bit on both 32 bit and 64 bit. This would require a second template parameter size_type for the MemoryAlgorithm, in addition to the pointer type. We can then migrate parts of the code to use this 32 bit quantity as much as possible for compactness. I think this Option 2 is safer and philosophically closer to what is really going on. What do you prefer? Arno -- Dr. Arno Schödl | aschoedl@think-cell.com Technical Director think-cell Software GmbH | Chausseestr. 8/E | 10115 Berlin | Germany http://www.think-cell.com | phone +49 30 666473-10 | US phone +1 800 891 8091 Amtsgericht Berlin-Charlottenburg, HRB 85229 | European Union VAT Id DE813474306 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl
participants (2)
-
Arno Schödl
-
Ion Gaztañaga