
----- Mensaje original ----- De: Peter Dimov <pdimov@mmltd.net> Fecha: Sábado, Mayo 5, 2007 12:17 pm Asunto: Re: [boost] Serialization support,Was: [BoostCon07][testing] Session reminder. Para: boost@lists.boost.org
Sohail Somani wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Peter Dimov
If your serialization methods
- allow creation of objects whose invariants do not hold, or - expose implementation details in the external representation,
What do you mean by the second line?
Consider for example a set<int>. The proper way to serialize it is as a sequence of values. This is what the user sees, and this external format is robust against changes in the implementation. If you serialize it as a tree of Node classes, this would make it quite hard to switch to a skip list representation later.
I'd like to join this intrusive vs. non-intrusive serialization discussion from my experience in providing serialization support for Boost.MultiIndex. As I see things, there are two related but different ways in which serialization support can be said to be intrusive: a) data intrusive: the stuff serialized reflects the internal structure of the class. The set<int> example proposed by Peter above illustrates a case of (unwise) data intrusive serialization. b) interface intrusive: the serialization algorithms cannot be implemented by using the class public interface alone. For instance, up to Boost 1.33 (if my memory serves me well), serialization of shared_ptrs was provided in an interface intrusive way because no non-intrusive approach was found --this has fortunately been corrected now. Usually, data intrusive serialization implies interface intrusive serialization, but *not* the other way around: Boost.MultiIndex serialization support is indeed interface intrusive but not data intrusive; let me explain why I had to do things this way. No one doubts non-intrusive serialization is the preferred approach, when feasible. Now consider the process of serializing a multi_index_container: we naturally want deserialization to reconstruct the order in which the elements were arranged in *every* index of the container (hashed indices excluded from this guarantee, as relying in the arbitrary order they provide is unsound to begin with). If we follow the natural approach of saving the sequence of values as traversed by some of the indices (let's say index #0), this guarantee is not held, let's see the problems encountered with each index type: * Ordered indices: no problems with *unique* ordered indices, since the order there is strictly determined by the values contained. But what about non-unique ordered indices? Consider this container: multi_index_container< int, indexed_by< ordered_non_unique<int_modulo_3_extractor>, ordered_non_unique<int_modulo_3_extractor> >
where the elements are deemed equivalent if they have the same value modulo 3. Now suppose we have the values 0,3 and 6 in the container, and index #0 lists them in the following order: 0 3 6 What can we say from this info about the traversal when done through index #1? The answer is: absolutely nothing, whatever permutation of these elements could be validly exposed by index #1. These variations from index #0 result from the fact that the values could have been inserted with hinted insert() through either of both indices. The final state is a convoluted function of the insertion history. * Sequenced and random-access indices: here it is even clearer that the order maintained by these indices cannot be inferred by those of other indices, since after insertion elements can be freely relocated. So, in order to provide our desired level of functionality (perfect reconstruction of every index traversal order) we cannot just save the elements as traversed by index #0, but we must somehow codify the variations from every other index wrt to the traversal order implied by index #0, when these variations are not unique --this is indeed what's done in an efficient way, using LIS (longest increasing subsequence) algorithms. So far, this approach is not data intrusive, since the information stored does not reveal any particular implementation detail and reflects only the nature of traversal orders allowable by index semantics. Now the remaining question is: can we use this info to reconstruct the traversal orders in a non interface intrusive way, i.e. by resorting only to the public interface of each index type? * Ordered indices: No; once an element is inserted into a multi-index container, there is no way (with public interface methods) of changing its order with respect to other equivalent elements in a non-unique ordered index, short of extracting and reinserting the element again, which, besides being terribly inefficient, destroys the relative element position gotten so far in other indices, in a sort of cacth-22 situation. * Sequenced indices: Yes. splice() (or relocate()) facilities can be used to alter the traversal order of a sequenced index without touching the rest of indices. * Random-access indices: Yes, with problems. splice() member functions are available here as for sequenced indices, but relocating elements around is a O(n) operation, implying a O(n*n) complexity for the task of reconstructing the entire index. An interface intrusive scheme can do this in linear time. So, I had no other choice but to implement serialization support for Boost.MultiIndex in an interface intrusive way. The morale of the story is: for rich-state classes where the exact state of an object depends heavily on its past history, non-intrusive serialization can be either algorithmically unfeasible (it is hard or impossible to reconstruct the history from the current state) or potentially less efficient than a interface intrusive approach. This does not mean that the class interface or the serialization support implementation are "broken". Just my 2c, sorry about the long post. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo