[serialization] Dealing with any tainted types.

The serialization library allows for the painless serialization of most data types and even polymorphic types like variants and vectors. There is one type, however, that doesn't have a serialize function that makes sense. That type is any. It is easy to see that if we make an any serialize function, we're going to need to make some assumptions at least about which subset of types it might contain. I have a complex algebraic datatype, lets call it C, that has a member (that has a member...) of type T. T is "tainted" with a member of type any. I'm looking at the possible ways to serialize C with boost.serialization. We can know, at runtime, which subset of types the any member of T can contain and how to serialize each of those types. Lets call the dictionary type that has that information D. Here are the options I've come up with so far... Option 1: Make some global variable of type D, called d. Before calling any serialization of C, I ensure d has the correct value. The serialize function for T uses d to figure out how to serialize the any member. This option would certainly work, but I'm using a global variable as a workaround of the fact that I cannot add arguments to T's serialize function. No points for beauty here. Option 2: Instead of serializing a value of type C, serialize a value of type "struct CWithDict { C c; D d; }". In this serialization function I can use d whenever I need. Unfortunately the contents of this serialize function would need to duplicate most of the functionality of boost.serialize in the first place since T is buried deep within C's structure. Although this option works, it requires rewriting a bunch of serialize which isn't attractive. Option 3: Make an archive wrapper: template< typename Archive > struct DArchive { Archive a; D d; }; DArchive would model the Archive concept by forwarding functionality to a. However, the T serialize function could access DArchive's d member for serialization of the any. This would solve the problem at the expense of extending the meaning of Archive a bit. It seems pretty elegant to me. So, that's what I've come up with. I'm interested in comments. Does anyone know of a better way to do this? Could this possibly lead to a general mechanism for similar problems? TIA, David -- David Sankel Sankel Software www.sankelsoftware.com

David Sankel wrote:
The serialization library allows for the painless serialization of most data types and even polymorphic types like variants and vectors.
There is one type, however, that doesn't have a serialize function that makes sense. That type is any. It is easy to see that if we make an any serialize function, we're going to need to make some assumptions at least about which subset of types it might contain.
I have a complex algebraic datatype, lets call it C, that has a member (that has a member...) of type T. T is "tainted" with a member of type any. I'm looking at the possible ways to serialize C with boost.serialization. We can know, at runtime, which subset of types the any member of T can contain and how to serialize each of those types. Lets call the dictionary type that has that information D.
Here are the options I've come up with so far...
Option 1:
Make some global variable of type D, called d. Before calling any serialization of C, I ensure d has the correct value. The serialize function for T uses d to figure out how to serialize the any member.
This option would certainly work, but I'm using a global variable as a workaround of the fact that I cannot add arguments to T's serialize function. No points for beauty here.
take look at how shared_ptr is serialized. Seems to me a similar problem. This was handled by adding a "helper" class just for this shared_ptr type. Such a "helper" could hold the otherwise "global" variable just for that archive instance, this maintaining the thread-safe characterstic of the library..
Option 2:
Instead of serializing a value of type C, serialize a value of type "struct CWithDict { C c; D d; }". In this serialization function I can use d whenever I need.
Unfortunately the contents of this serialize function would need to duplicate most of the functionality of boost.serialize in the first place since T is buried deep within C's structure. Although this option works, it requires rewriting a bunch of serialize which isn't attractive.
Take a look at "extended type info". This extends the rtti system to handle types identified by a string at runtime. This is the basis for the "export" functionality.
Option 3:
Make an archive wrapper:
template< typename Archive > struct DArchive { Archive a; D d; };
DArchive would model the Archive concept by forwarding functionality to a. However, the T serialize function could access DArchive's d member for serialization of the any.
This would solve the problem at the expense of extending the meaning of Archive a bit. It seems pretty elegant to me.
This seems similar to the "helper" described above. That is there is the concept of a "naked_text_iarchive". text_archive looks something like: class text_iarchive : public naked_text_archive, shared_ptr_helper { ... }; This seems similar to what you want to do. There are a couple of problems with this. It's become clear that the "Archive Concept" is currently ambiguous and this needs improvement to support the construction of robust archives by other users. The current efforts to do this have been successful. But this ambiguity makes these much more fragile than they should be. Sometime ago, the concept of a dynamic "helper" was built into the archive base class. This permited the attachment at runtime of code by types which otherwise would not be serializable. The only type that needed this at the time was shared_ptr. I didn't document it as I saw it as a carbuncle on the face of my otherwise pristine library. No other type ever needed that "hack". Then I took that code out, and added on the specific helper for the shared_ptr type which exists to this day. Of course you might guess what happened. Soon after I took that code out and changed to the the "statically" added shared_ptr_helper, A new type appeared (flyweight) which was not serializable with out similar functionality. There was talk about going back to the old system - but I couldn't face doing the extra work and it never got done. Of course this would require some more documentation and concepts, etc., etc. Also even though such a facility is almost never necessary, people would start to use it and then there would be whole 'nother source of questions to support. So, ........ To really do this right, I see the following as necessary a) Clarify and simplify the current archive concept. I've thought about this alot and know what I want to do - but I'm not excited enough to do it. b) Go back to the original runtime helper and update the documentation accordingly. c) tweak the shared_ptr serialization to use the runtime helper d) and redefine ?_iarchve as what naked_?_archive is now. It might not seem like a huge amount of work. But it's enough to disuade me from starting. It would also mean eliminating the workd "naked" from my naming - which I've grown fond off.
So, that's what I've come up with. I'm interested in comments. Does anyone know of a better way to do this? Could this possibly lead to a general mechanism for similar problems?
I think that if the archive concept were "fixed" it would permit things like you suggest - better extention through derivation. Also it might permit copying of one archive type to another to permit these extentions to be dynamic. E.G. void serialize(Archive &ar, T &t, const unsigned version){ ArchiveWithNoTracking arnot(ar); arnot & arnot; } Food for thought. Robert Ramey

Thanks for your detailed response Robert... On Thu, Jan 13, 2011 at 4:38 PM, Robert Ramey <ramey@rrsd.com> wrote:
David Sankel wrote:
The serialization library allows for the painless serialization of most data types and even polymorphic types like variants and vectors.
There is one type, however, that doesn't have a serialize function that makes sense. That type is any. It is easy to see that if we make an any serialize function, we're going to need to make some assumptions at least about which subset of types it might contain.
I have a complex algebraic datatype, lets call it C, that has a member (that has a member...) of type T. T is "tainted" with a member of type any. I'm looking at the possible ways to serialize C with boost.serialization. We can know, at runtime, which subset of types the any member of T can contain and how to serialize each of those types. Lets call the dictionary type that has that information D.
Here are the options I've come up with so far...
Option 1:
Make some global variable of type D, called d. Before calling any serialization of C, I ensure d has the correct value. The serialize function for T uses d to figure out how to serialize the any member.
This option would certainly work, but I'm using a global variable as a workaround of the fact that I cannot add arguments to T's serialize function. No points for beauty here.
take look at how shared_ptr is serialized. Seems to me a similar problem. This was handled by adding a "helper" class just for this shared_ptr type. Such a "helper" could hold the otherwise "global" variable just for that archive instance, this maintaining the thread-safe characterstic of the library..
Option 2:
Instead of serializing a value of type C, serialize a value of type "struct CWithDict { C c; D d; }". In this serialization function I can use d whenever I need.
Unfortunately the contents of this serialize function would need to duplicate most of the functionality of boost.serialize in the first place since T is buried deep within C's structure. Although this option works, it requires rewriting a bunch of serialize which isn't attractive.
Take a look at "extended type info". This extends the rtti system to handle types identified by a string at runtime. This is the basis for the "export" functionality.
Option 3:
Make an archive wrapper:
template< typename Archive > struct DArchive { Archive a; D d; };
DArchive would model the Archive concept by forwarding functionality to a. However, the T serialize function could access DArchive's d member for serialization of the any.
This would solve the problem at the expense of extending the meaning of Archive a bit. It seems pretty elegant to me.
In retrospect, this clearly won't work. As soon as a << x happens, d is unavailable for the serialization of all of x's descendants.
This seems similar to the "helper" described above. That is there is the concept of a "naked_text_iarchive". text_archive looks something like:
class text_iarchive : public naked_text_archive, shared_ptr_helper { ... };
This seems similar to what you want to do.
yup. I was able to hack something up to do what I want. But...
<snip> To really do this right, I see the following as necessary
a) Clarify and simplify the current archive concept. I've thought about this alot and know what I want to do - but I'm not excited enough to do it.
I've been giving some thought to this. Not as much clarifying and simplifying, but more distilling the essence of the domain. I have a feeling that if we nail the essence down, all the compositionality will be there without having to tack it on as an afterthought. Here's what I have so far: concept Archive: struct _ where { typedef _ RState ; typedef _ WState ; template< typename T > struct lookup { typedef _ type // This _ is either mpl::void_ or // std::pair< function< void (RState&, const T&) > // , function< T (WState&) > // > ; type operator()() const; } }; Something fits the archive concept if they fill in the blanks above. RState and WState are state information required for reading and writing. The lookup type function, passed type T, will either return mpl::void_ or a pair. If it returns void_ we know that T is not considered a primitively serializable type. If it returns the pair, we know it is a serializable type witnessed by the pair of write and read functions returned by operator(). One key condition is that the primitive types for an Archive, if they are templates, must be *fully saturated*. Meaning that: template<> lookup< std::vector<bool> > is fine, but template<typename T> lookup< std::vector<T> > is not. This condition prevents recursive lookup calls with non-primitives. This, I think, is going to be the key to compositionality later. More to come... Does all of this make sense so far? David -- David Sankel Sankel Software www.sankelsoftware.com

David Sankel wrote:
Thanks for your detailed response Robert...
<snip> To really do this right, I see the following as necessary
a) Clarify and simplify the current archive concept. I've thought about this alot and know what I want to do - but I'm not excited enough to do it.
I've been giving some thought to this. Not as much clarifying and simplifying, but more distilling the essence of the domain. I have a feeling that if we nail the essence down, all the compositionality will be there without having to tack it on as an afterthought.
Here's what I have so far:
concept Archive: struct _ where { typedef _ RState ; typedef _ WState
; template< typename T > struct lookup { typedef _ type // This _ is either mpl::void_ or // std::pair< function< void (RState&, const T&) > // , function< T (WState&) > // > ; type operator()() const; } };
Something fits the archive concept if they fill in the blanks above. RState and WState are state information required for reading and writing. The lookup type function, passed type T, will either return mpl::void_ or a pair. If it returns void_ we know that T is not considered a primitively serializable type. If it returns the pair, we know it is a serializable type witnessed by the pair of write and read functions returned by operator().
One key condition is that the primitive types for an Archive, if they are templates, must be *fully saturated*. Meaning that:
template<> lookup< std::vector<bool> >
is fine, but
template<typename T> lookup< std::vector<T> >
is not. This condition prevents recursive lookup calls with non-primitives. This, I think, is going to be the key to compositionality later. More to come...
Does all of this make sense so far?
In all honesty, I didn't understand even one sentence of the above. Robert Ramey

On Fri, Jan 14, 2011 at 6:17 PM, Robert Ramey <ramey@rrsd.com> wrote:
David Sankel wrote:
Thanks for your detailed response Robert...
<snip> To really do this right, I see the following as necessary
a) Clarify and simplify the current archive concept. I've thought about this alot and know what I want to do - but I'm not excited enough to do it.
I've been giving some thought to this. Not as much clarifying and simplifying, but more distilling the essence of the domain. I have a feeling that if we nail the essence down, all the compositionality will be there without having to tack it on as an afterthought.
Here's what I have so far:
concept Archive: struct _ where { typedef _ RState ; typedef _ WState
; template< typename T > struct lookup { typedef _ type // This _ is either mpl::void_ or // std::pair< function< void (RState&, const T&) > // , function< T (WState&) > // > ; type operator()() const; } };
Something fits the archive concept if they fill in the blanks above. RState and WState are state information required for reading and writing. The lookup type function, passed type T, will either return mpl::void_ or a pair. If it returns void_ we know that T is not considered a primitively serializable type. If it returns the pair, we know it is a serializable type witnessed by the pair of write and read functions returned by operator().
One key condition is that the primitive types for an Archive, if they are templates, must be *fully saturated*. Meaning that:
template<> lookup< std::vector<bool> >
is fine, but
template<typename T> lookup< std::vector<T> >
is not. This condition prevents recursive lookup calls with non-primitives. This, I think, is going to be the key to compositionality later. More to come...
Does all of this make sense so far?
In all honesty, I didn't understand even one sentence of the above.
Ah, darn. I have the terrible curse of thinking in a very precise and powerful language (denotational semantics with Agda as the semantic domain in this case), but find it terribly difficult to translate it to prose. For the record, here is what I'm trying to translate: μ [[concept Archive]] = data Archive : Set where archive : (RState,WState : Set) → (l : List ( T : Set , (RState,T) → IO RState , WState → IO (WState,T) )) → UniqueFsts l → Archive I'm going to think more about how to explain this and try again. Thanks for hanging in there. David -- David Sankel Sankel Software www.sankelsoftware.com
participants (2)
-
David Sankel
-
Robert Ramey