Re: [boost] [serialize] Handling multiple files, deferred marshalling, and progressive loading

4 Jul 2006

      Hi Robert,

sorry for my bad understanding, but your previous answer left me in the
haze. Could you please elaborate?

Just so that we stay on the same track, I'm using shared_ptr all through my
application. I have, as instance, a resource that points to another one,
both resources being stored in a manager (even if resources are of different
types, they all derive from the same base class). All the way, we have
shared_ptr. I use a lot of these links, to express dependency, and
hierarchy. I don't even see how I could use another approach, as I need to
be able to dynamically change the hierarchy or the dependency.
So what I really need, is to be able to have a resource of type USED being
loaded, used by some other resources of type USING (as there is a lot of
sharing), and then later load an additional resource of type USING, and have
that last one linked to the resource of type USED.

Does it make sense?

Cheers

-----Message d'origine-----
De : boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] De
la part de Robert Ramey
Envoyé : mardi 4 juillet 2006 17:16
À : boost@lists.boost.org
Objet : Re: [boost] [serialize] Handling multiple files,deferred
marshalling, and progressive loading

SeskaPeel wrote:
...
Hi Robert,
...
...
1/ First, we are using multiple files, which can be created at
different moments. The main reason is that we want to be able to
share resources contained in one file for multiple projects. An
alarm rung in my head
telling me that serialize won't be able to handle this case,
why not?
Because I saw nowhere in the docs how serialize would restore
pointers when loading from multiple files. Suppose I load file1 that
contains a resource named "r1". After file1 is fully loaded, and say
5 minutes later, I load file2. Inside that file, there is a resource
"r2" that needs a link to "r1", how will this case be handled?
Its not a case I ever considered so off hand I'm not sure how to answer.
But I can speculate a little.  Take two cases

class a {
    ... x *ptr;
    template<class Archive>
    void serialize(Archive, const unsigned int version){
        ar >> ptr;
    }
};

class b {
    x & ref;
    template<class Archive>
    void serialize(Archive, const unsigned int version){
        ar >> ref;
    }
};

Both cases are quite similar - one object refers to some external
object.  In the first case a new object of type x is created each
time an instance of a is loaded. In the second case the object
is presumed to exist and the data is just loaded into it.  SO
the serialization library sort of presumes that one is using
pointers and references in this common way.  This is just
a decision I made.  Most people don't see this as it turns
out this is the way the people use pointers vs references
so things "just work" like one expects.  By making the
serialization "fancier" one could replace the behavior
for pointers to work like the the current behavior for
references which sounds like something you want to do
to address your situation.  Or you could just change
the pointers to references in your own code and get
everything to "just work"  Its up to you.
...
I suppose I'll have to manually iterate over the freshly loaded
resources
and check if they need to be "post-loading associated". As I can't
know if these resources need or not this last step, I'll have to
check each time I load a file, and thus, I could handle the internal
linking as well in this step, though it will be easier (and more
efficient) if it's done by the loading lib.
Of course you could do it that way as well.
...
So, is there something I misunderstood, or some feature I overlooked?
Hmmm - I suspect that since serialization works so painlessly ALMOST
all the time, one doesn't have much occasion to look under the hood
until things get more complex and specialized.  I think that at that
point its valuable to take another look at what one is trying to serialize
and ask himself - hmmmm - why is this hard?  going through
and considering each variable as to wheter it should be a pointer,
reference, pointer to a "const" object, reference to a "const" object,
a "const" pointer to a non-const object, or just a normal member
variable (const or not) etc. will often lead to a re-characterization
of some variables and then things will again "just work".  I also
believe that it will improve the rest of one's code as it forces him
to think about why each variable is used the way it is.  Also,
the excercise will end up giving the compiler more information
about how each variable is used and can permit the compiler
to better optimize generated code (at least theoretically).  FWIW
I believe that "const" is generally under-appreciated and under-used.
...
...
...
2/ And secondly, we want to be able to load files progressively, say
100KB
by 100KB. Once the file is completely loaded, the pointer
restoration can happen. Does serialize support such feature or plan
to? If this is not a
huge work, is there a way to provide help to get this feature
quickly?
If you're in a hurry - consider the prescription above.  It will require
changing your own classes - for the better in my view - but you'll
be done with it.
...
Not necessarily from a thread, but the aim is to suspend the loading
of a file, and then resume it some time later. What would be even
better would be to specify how long or how many bytes should be read
before the loading function suspends and returns.
I read about your custom archives (some time ago I have to admit),
and it didn't seem an evidence to me that I could implement this
feature. Could you provide some more hints?
The deserialization process uses the stack to store its state.  Hence,
the only simple and practical way to do this in a practial way is to invoke
loading on a separate fiber, coroutine or thread.

If you want to hack your own code some you could do something like
having a top level array of serializable objects and serialize them each
independently so you could to the process piece by piece.  The
serialization library does permit the same streambuf to be used
and passed around so that serialization can be "embedded" inside
of some other streambuf operations - etc.  But by far the easiest
way would be to use the co-routine approach above.
...
...
...
Today, we have a manual phase of association that is called after
all files are loaded. Each time we load a new file, we "post-load"
all the resources contained in it, and pointer restoration is
achieved this way.
Using my suggestion above - this wouldn't be necessary.  The commonly
referenced objects are "pre-created and registered" by the top level
object constructors and everything is keep in sync automatically throughout
the serialization process.
...
...
In the serialization library, pointers are restored "on the fly"
(depth first).
Yes, that's why I'm considering porting to it :) What does "depth
first" means?
a contains pointer to b which contains a pointer to c, etc so the
sequence of operations is a, b, c, .... .  That is, a is complete when
only when all its components have been loaded.

Robert Ramey

_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost