Is there interest in a library for object (especially STL object) marshalling?

Hi there,
I'm not totally sure if this is the right "fit" for boost or not. Perhaps
this problem isn't general or abstract enough to warrant inclusion within
boost, but I thought I'd send an email around to see what people thought.
I've written a library that I've been using extensively, both for my own
code, and more recently for a large project in my work, that allows STL
objects to easily be exchanged over a function call, even if the caller and
the target function represent those STL objects using incompatible binary
layouts. This is intended in particular to deal with the common problem of
sharing STL objects between multiple compiled assemblies, such as a main
executable and satellite DLL's. I'm already using this library extensively
to exchange STL objects over pure virtual interfaces for a plugin-based
application.
The traditional advice people give is "don't use STL objects across DLL
boundaries", but this is impractical for applications that are heavily
modularized, or plugin-based applications with complex interfaces. Dealing
with raw arrays and pointers is often difficult and error-prone, hence why
we have STL containers to begin with. Returning data from a function call
can also become extremely difficult when STL containers aren't used. With
an array for example, if the total number of elements cannot be known by
the caller, this can lead to fixed-sized buffers being used with hard-coded
maximums, or functions requiring multiple calls, either to determine the
size of the data, or to free an allocated buffer that's been returned.
Attempting to use STL containers across assembly boundaries without
addressing STL compatibility issues however basically requires you to use
the exact same compiler version and compiler flags when building each
assembly, which is not really possible with a plugin system designed to
allow third-party plugins, or even simply with assemblies built at
different times on different systems, and it's also inconvenient for
development, since you can't easily compile just one assembly with debug
compiler settings, while leaving the other assemblies compiled with release
compiler settings.
Ideally, you want the same signature for a function or method, whether the
target code is contained within the same assembly or not. The goal is to
make these assembly boundaries "invisible", so you can just write the same
code to access functions, regardless of where the code for the target
function is located. With the help of this library, along with a simple
inline wrapper function which the caller is basically unaware of, that can
be achieved, and existing code which makes use of STL objects can be
shifted into separate assemblies without refactoring. I haven't found
another library like this in the public domain, that attempts to solve this
issue comprehensively for any type.
Some of the points about my current library implementation are as follows:
-Fully template based, with all code defined in header files.
-Support for all *primary (see exclusions below) STL container types up to
C++11, including strings and keyed containers.
-Supports infinite nesting of STL container types, IE:
std::vector

On 10 Oct 2013 at 13:07, Roger Sanders wrote:
What I wanted to get an idea on is whether there is any interest in having something like this included as part of boost? Perhaps there's enough scope here for a general "marshalling" library, that would address some other similar concerns in the future? There would be a fair amount of work to do in order to "boostify" the library and write the corresponding documentation, but I'd be willing to do it if there was interest. Any feedback, thoughts, criticisms?
Me personally, I would absolutely just *love* for this library to enter Boost. Let me quickly explain why: I'm hoping, if possible, to eventually implement a standard component object layer for C++. A large part of such an implementation is STL implementation interop i.e. that across component object boundaries where component objects use different versions of an STL, or different STLs entirely, a limited amount of STL container conversion is performed. My design also allows for components compiled using one compiler e.g. Visual Studio using Dinkumware generating a PE DLL on Windows, to be equally treated by a Linux program compiled using GCC with libstdc++ i.e. a Linux program can use Windows binaries, and vice versa (obviously only if no platform specific code is used e.g. STL only). Your library, if ported to Boost, would shave a huge chunk off the work I'd need to do, so very much yes please. (FYI there was a presentation at C++ Now 2013 on a topic very similar to your library by John Bandela "Easy Binary Compatible C++ Interfaces Across Compilers". See https://github.com/boostcon/cppnow_presentations_2013/blob/master/tue/ easy_binary_compat.pdf?raw=true. It might be worth you touching base with him). Niall -- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

On 10/10/2013 3:07 PM, Quoth Roger Sanders:
The only compiler requirement here is that each assembly uses a compatible C++ ABI, or at least, as much as is required to allow calling a virtual member function on an object that was constructed within another assembly.
That could be a steep requirement, since AFAIK vtable location and layout isn't standardised, especially once you start getting into multiple and virtual inheritance. And name mangling can get in the way of finding the things to be called in the first place. And memory management is always entertaining given that all sorts of weird and different allocators can be in use even before you introduce a different compiler into the mix. (Though this is where shared_ptr and unique_ptr's pointer+deleter concept can get you out of a jam, although it's more common to use opaque handles and explicit destroy functions.) Having said that: if you can make it work, it would be awesome.

On 11 Oct 2013 at 11:30, Gavin Lambert wrote:
On 10/10/2013 3:07 PM, Quoth Roger Sanders:
The only compiler requirement here is that each assembly uses a compatible C++ ABI, or at least, as much as is required to allow calling a virtual member function on an object that was constructed within another assembly.
That could be a steep requirement, since AFAIK vtable location and layout isn't standardised, especially once you start getting into multiple and virtual inheritance. And name mangling can get in the way of finding the things to be called in the first place.
We had an internal prototype at a former employer of mine which interoped between the Itanium ABI (GCC) and MSVC as a thought exercise of just how bad things could get. Single inheritance vtables are actually identical between the two, but virtual inheritance is not so we didn't allow that, and multiple inheritance can be worked around with some difficulty (amongst the issues MSVC doesn't do empty base class optimisation at times, and persuading GCC to respect that is non-obvious). An interesting caveat is that x86 MSVC uses stdcall calling convention for member function calls, but x64 MSVC uses cdecl. Therefore on x86 the interop thunk has to flip the parameter stack and do callee cleanup. You also have to avoid all returns completely, because MSVC and GCC don't do the same thing, so returning via a pointer write is the only portable way. Mangling was very straightforward: MSVC's is far more rich up to the point of needing a non-trivial tokenising parser, and everything the Itanium ABI specifies is easily representable in MSVC's, albeit sometimes not obviously (MSVC inverts what an array type means in mangling at times depending on context). For the other way round generating a link error where it can't interop is straightforward.
And memory management is always entertaining given that all sorts of weird and different allocators can be in use even before you introduce a different compiler into the mix. (Though this is where shared_ptr and unique_ptr's pointer+deleter concept can get you out of a jam, although it's more common to use opaque handles and explicit destroy functions.)
C libraries are actually somewhat binary interchangeable, at least more so than STLs. Local C library malloc was always used in our prototype.
Having said that: if you can make it work, it would be awesome.
Getting the above system to work reliably needs a graph database to mark out what needs to be done with what, and what does work versus the majority which does not work. Such a graph database needs to work at the binary load layer, and therefore needs to be the filing system and little extra. For such a graph database to have any chance of performance, it needs async batch file i/o. And hence me writing Boost.AFIO, with hopefully that thin graph database coming next. It'll take at least a year :) Niall -- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

On 11 October 2013 09:30, Gavin Lambert
On 10/10/2013 3:07 PM, Quoth Roger Sanders:
The only compiler requirement here is that each assembly uses a compatible
C++ ABI, or at least, as much as is required to allow calling a virtual member function on an object that was constructed within another assembly.
That could be a steep requirement, since AFAIK vtable location and layout isn't standardised, especially once you start getting into multiple and virtual inheritance. And name mangling can get in the way of finding the things to be called in the first place.
I'd love to have something that didn't rely on ABI compatibility, but I don't think that's really going to be possible, or at least, not while adhering to the portability requirements of a boost library. With the new standard layout type definition in C++11, and with the help of the new alignment querying/control in C++11, it's now finally possible to produce a data structure that you can create in memory and guarantee you can correctly read in code compiled under a different compiler, without relying on undefined or implementation-specific behaviour. Unfortunately, while sharing data is nice, C++11 has done nothing to help with the bigger issue of sharing code. Using the extern keyword with a linkage specification (IE, as in the extern "C" usage) is too limited and tries to combine a few different concepts together, and as a result, ends up doing a poor job at all of them. What we really need is a new keyword, separate to the extern keyword, that is specifically intended to specify a platform-specific function "format", IE, the calling convention, that the compiler should use for that function. In particular, as opposed to the extern keyword, it should: 1. Be able to be applied on member functions as well as non-member functions 2. Be able to be used on overloaded functions 3. Not affect function linkage when used (IE, doesn't mark a function as having external linkage) With a new keyword that meets this requirement, you'd be able to actually mark a given function as explicitly having a particular calling convention. As long as two compilers support a compatible calling convention, you could then safely invoke that function from code compiled using another compiler, or even another language. The calling convention is still "implementation specific", as it has to be, since calling conventions are by their very nature platform specific, but it's now supported by the language, with compile-time checking for compatibility. With support for calling an individual function with compile-time checking for calling convention compatibility, you could then just provide a base class which meets the requirements of a standard layout type, and contains function pointers to each exposed function, with some inline wrapper functions to invoke them. This would give identical usage syntax to calling the methods natively, fully checked for compatibility at compile time, with behaviour guaranteed by the C++ standard, without requiring ABI compatibility. Without this kind of an enhancement to the language itself, I don't think it's really going to be a satisfying result attempting to provide any kind of boost interop library that attempts to provide interop without assuming some level of ABI compatibility. I am optimistic on another front though. The simple fact is, for any given platform, there's a limited number of C++ compilers people actually use, and most of them attempt a degree of compatibility anyway. The biggest problem is really on x86/x64 systems, with GCC and MSVC being incompatible. Clang is rapidly developing though, and I've seen there's a push for Clang to have full ABI support for both GCC and MSVC, IE, as a compiler switch to select between the two. Once we reach that point, it really just becomes a matter of any given software system selecting a base ABI to use, and Clang can then be used regardless of that choice to compile compatible code for that system. With this kind of solution in place, you could then just write code relying on ABI compatibility, and implement a language-supported solution when one becomes available (IE, just by marking all functions on an exposed interface with the kind of keyword I've proposed above). Whatever you attempt with ABI compatibility though, you still need to solve the STL issue as a higher-level problem though, which is what I'm proposing to add to boost at this stage.
And memory management is always entertaining given that all sorts of weird and different allocators can be in use even before you introduce a different compiler into the mix. (Though this is where shared_ptr and unique_ptr's pointer+deleter concept can get you out of a jam, although it's more common to use opaque handles and explicit destroy functions.)
I may be able to package up something for boost on the memory management side that can help too actually. In my work, we had the issue of wanting to be able to create an instance of a type, where only a base interface of that type was known to external code, with the actual implementation being completely internal to a particular module. We wanted to be able to easily create these types without having to specifically call allocators and deallocators (especially the deallocators, to avoid memory leaks), and we needed to be able to pass these types back from function calls, again, without the caller having to manually deallocate them. Something like shared_ptr was unsuitable though, not only because it felt too heavy-handed (we didn't need or want the reference counting), but also because it couldn't be passed safely across DLL boundaries either. My solution was a very thin "pointer" type, which basically just referenced the allocator and deallocator within its constructor and destructor, and used C++11 move semantics to allow returning them easily from function calls. The final version was macro-based to automatically generate these pointer types for any interface type, with only a couple of macro calls to create all the necessary code, including the allocators and deallocators. This pointer system, along with the STL interop code, are the basis for our communication model we use between components, and the payoff is very significant. We have easy bi-directional communication across pure virtual interfaces, with STL objects being shared natively on those interfaces, with minimal overhead and without ever having to manually call an allocator or deallocator, or manually unpack or repack an STL container. You can write code which creates a type and calls functions on that type, which is actually calling an allocator from a different module and invoking functions over a pure virtual interface for that type, with STL objects being shared, that looks identical to creating that type on the stack and calling functions on that type as if it was defined within the same module, with a compatible STL implementation. I could have another look at this code and see if it could be improved. I think it should be possible to build a more general fully template-based solution that doesn't rely on macros, in which case it might be a worthwhile addition along with the STL marshalling classes.
participants (3)
-
Gavin Lambert
-
Niall Douglas
-
Roger Sanders